MPCS_51221-Lab

Lab 9 Details for MPCS 56600

Each lab will consist of a small problem and details of how to proceed. Each lab is intended to give every student hands-on experience with the core concepts and technologies covered during the course. A student may concentrate, as a team member, on one technology over another for the final project, but labs are designed to give each and every student exposure to all the technologies that come into play. You need to submit labs to the TAs for grading--see submission instructions below. Generally, unless otherwise specified, you will have one week to complete each assigned lab.

See the syllabus for information on grading. Turning in lab assignments on time is required, without exception, and all late deliveries will be penalized, regardless of cause. Submit your assignments to the subversion repository according to the directions on the syllabus page.

Lab 9 Due: 4:00 pm, August 13, 2024

Problem 1: Distributing your Blockchain using Docker Networking and Docker Compose:

BACKGROUND:

Like all programming problems, learning a new technology is not an exercise in reading but rather an exercise in thinking and typing. This lab is designed to give you hands-on experience in some fundamental skills involved in Docker containerization. You will generally find the References section below helpful in addition to the required and recommended reading. When we talk about "Docker", we are talking specifically about the Stable Community Edition of Docker, which is the version we will be using in this class. The Stable Community Edition provides the basic container engine and built-in orchestration, networking, and security.

In this lab problem, you will distribute your blockchain across multiple Docker containers and have them all inter-communicate in terms of broadcasting transactions and mining blocks.

WHAT YOU NEED TO DO:

First, let's just pause for a moment and understand where you are and where you are going. Right now, through Lab 8 you have a program (or two) that (a) generates new transactions and adds them to your blockchain and (b) mines them. All you're going to do here is distribute this program so that the same program runs on multiple Docker containers ("full node peers"), so that multiple transactions are coming from multiple containers, and multiple containers are mining the transactions they hear about.

Make sure you have read and understood the implications of Chapter 8, The Bitcoin Network, of Antonopoulos.

As usual, anything not specifically prescribed or proscribed in the following instructions is left to your discretion (aka creative license) as to exactly how to implement.

Details:

The first thing you will need to decide is what distributed communication mechanism you are going to use. Of course, you can choose to simply work with raw sockets. This will prove engaging but also challenging, as you may have to worry about endianness, structural boundaries, string termination, communication faults, and all sorts of other things that may come into play. That said, communicating over raw sockets requires nothing other than your programming language and its libraries.

The other options include Java RMI, Google gRPC, Windows Communication Foundation WCF/.NET, or any other mechanism you wish (you may NOT choose a shared file system such as NFS, etc., where you only really have a "single" blockchain that every peer jointly "sees") that will allow for the exchange of serialized "string" of data over the network between containers. The exact mechanism of data sharing is entirely up to you, but in general, you will need to be able to serialize your transactions into some form of "byte stream", and transmit that stream between containers. Using a mechanism such as gRPC or RMI will go a long way towards simplifying the handling of the serialization of your data. Note your mileage may vary when trying to serialize an actual Transaction object as opposed to a simple string. This will be left to you to ponder. That said, the only thing you will really need to transfer are Transactions and Blocks and a few handshaking structures.

Vocabulary: When we speak of a "Node" below, we mean a Docker container. When we speak of a "Full Node", we mean a docker container running an instance of your Blockchain Mining Software you produced for Lab 8, now newly distributed (as described below)

STEP 1 (Time T₀):

Figure out what default port your bitcoin network will establish initial communication over, ie, the handshake port. You can use any ephemeral PORT, such as 58333 (bitcoin's default for Mainnet is 8333), or 12345, etc. Your choice.

In one docker container, run a docker container whose "--name" is "DNS_SEED", which will contain a program (Registrar) which will listen on the ephemeral PORT you have chosen (e.g., 58333, 12345, etc.), i.e., whose only job is to listen for incoming connections (via whatever technology you are using for network connections--raw sockets, RMI, WCF, gRPC, etc.). For your docker image, you are free to chose any docker image base that makes sense to you, an ubuntu-based image is always a good bet.

STEP 2 (Time T₁):

Each Full Node runs in a docker container that is linked to the DNS_SEED node in its docker run command (--link DNS_SEED).

The Registrar program running in the DNS_SEED node does not actually have to be running your blockchain mining software (a Full Node)...it can be a simple network server (running RMI, gRPC, WCF, raw socket) whose job it is to listen (on your ephemeral PORT) for incoming connections and to return a list of {0...1} "Full Nodes" on your coin network that have already registered with the Registrar program running on the DNS_SEED node. Note that the first Full Node that registers (at time T₁) will receive null/0 nodes back from the DNS_SEED; however, its own IP address will have been registered with the DNS_SEED. You can think of the DNS_SEED node as acting as if it is a DNS service. A Full Node (such as Node A) as we are describing it is a distributed version of your blockchain mining software you developed for Lab 8.

The first network call a newly-launched Full Node makes is a call to the DNS_SEED's register() function/method. The Full Node initializes and sends (via the register() function) a serialized structure/object [Registration] that contains the following information:

struct/class Registration {
nVersion: The version of your mining software, defaults for this Lab to 1.
nTime: The current time on the joining node
addrMe: The IP address of the joining node (e.g., "172.17.0.3" on your docker subnet)
}

Upon receipt, the DNS_SEED records this information in a list, especially noting the IP address of the registering Full Node in an internal datastore (can be non-persistent, i.e., an in-memory list or an actual database (e.g., MySQL, SQLExpress, etc., ... your choice).

Remember that it is the job of the Registrar to return a list of zero or one nodes ({0...1} "Full Nodes") that have previously registered with it (the Registrar always returns the latest registered node in its list or null/0 if it has no nodes in its list). In our example, as of T₁, this is the very first Full Node registering with the DNS_SEED, so the Registrar running on the DNS_SEED will return zero or a null list (your call on what that means) indicating that this registering node is the first node to register.

After a Full Node (let's call this Node A...you can consider it anything you wish) first communicates with the DNS_SEED node and is registered, each Full Node launches its distributed blockchain mining server which begins to patiently listen for network calls to its handshake() function/method (explained below).

STEP 3 (Time T₂):

As time goes by, a new Full Node (let's call this Node B) will run in a new docker container (again, always linked to DNS_SEED). This new Full Node will do the same thing as the previous Full Node: it will first register (Step 1 in the drawing) with the DNS_SEED server (as described above).

Once the newly-launched Full Node has registered (Step 1 in the drawing) with the Registrar running in the DNS_SEED (in our example this is Node "B"), if the node gets a non-null response from the DNS_SEED (which in this example it would, because the Registrar running on DNS_SEED would have returned the IP address for Node A, its last-registered node), the new Full Node ("B") will then send a handshake() message (i.e., call the handshake() function/method) on Node A's IP address it received back from the Registrar running in the DNS_SEED (Step 2 in our drawing), seeking to communicate with that IP address whose server (distributed blockchain mining software) is listening on port (say) 58333. The handshake() function/method takes in a Handshake class/structure from the calling peer that resembles the Registration structure/class described above, viz.:

struct/class Handshake {
nVersion: The version of your mining software, defaults for this lab to 1.
nTime: The current time on the calling node
addrMe: The IP address of the calling node (e.g., "172.17.0.5" on the docker subnet)
bestHeight: The height of the blockchain tip as this node knows it (can be the height of the genesis block)
}

Note the addition of the "bestHeight" field, which is the height of the tip of Node B's blockchain (can be 0 if it only has the genesis block). On receipt of a handshake() call, the receiving Full Node ("A") will record the IP address of the caller (Handshake.addrMe which will be Node B's IP address) in a list of its known peers (your call as to that form/structure). The list of known peers is simply a list of IP addresses a Full Node maintains of other Full Nodes that have "shook hands" with it, i.e., called its handshake() method and provided their IP address via Handshake.addrMe.

In returning from the handshake() call, the receiving Full Node (the node whose Handshake() function was called, in this case, Node "A") will return to the calling node ("B") a list of all the nodes that it knows about, i.e., it's list of known peers, (at this time, that list will contain only the IP of node "B", because Node A does not know of any other nodes other than the handshaking node "B"). We will leave it to you to determine the structure of the returned list of known peers. Node B will recognize its own IP returned from node A's list of known peers, and will not act further on it's own IP address.

STEP 4 (Time T₃):

Node A now has a single known peer: Node B. As time goes by, a new Full Node (let's call this Node C) will run in a new docker container (again, always linked to DNS_SEED). This new Full Node will do the same thing as the previous Full Nodes: it will first register with the DNS_SEED server (as described above, Step 1 in our drawing).

Once the new Full Node has registered with the DNS_SEED, if a node gets a non-null response from the DNS_SEED (which in this example it would, because the DNS_SEED would have returned the IP address for Node B (we can assume that the DNS_SEED simply returns it's "latest" addition to the network), the new Full Node ("C") will send a handshake() message (i.e., call the handshake() function/method) on the IP address it received back (it only receives one IP node back from the DNS_SEED), seeking to handshake() with that IP address whose server (distributed blockchain mining software) is listening on port (say) 58333, as described above (Step 2 in our drawing), which in our example, would be node "B".

Once the third newly-launched Full Node has registered with the DNS_SEED and has conducted a handshake with Node B (from which it learned of Node A because node "B" returned its list of known peers, which includes node "A"), the newly-launched Full Node ("C") will also handshake with any other peer nodes it has learned from the node it just shook hands with (in our case, Node "B"), and thus it will call the handshake() function/method on Node "A" providing Node A it's IP Address (that of Node "C").

Note that a Full Node will only call every given node's handshake() function/method once...i.e., it will maintain knowledge of which nodes it has already shaken hands with (your call on how to handle this...i.e., a second list of handshakes or a "handshake" flag on the list of known peers, etc.). If a Full Node discovers (via known peers) a Full Node with which it has not already shaken hands, it will call that node's handshake() function/method. In this way, all nodes will have eventually shaken hands with each other and it is never the case that a given node either (a) calls it's own handshake() function/method or (b) calls another Full Node's handshake() function/method more than once.

STEP 5 (Time T₄):

Once the third newly-launched Full Node has registered with the DNS_SEED and has conducted a handshake with Nodes B and now Node A (Node C learned of Node A from it's handshake with Node B), each node will now know of a minimum of two other nodes (known peers) on the network. Upon learning of the existence of Node C, Node A will in turn call Node C's handshake() method, seeking to discover Node C's list of known peers, which it will add to its own list of known peers.

Note that in our example, we are not limiting the number of handshakes with other nodes because we are intentionally keeping it simple. In a real blockchain implementation, any given node would not seek to know of "all" nodes on the network, just a select subset. We are not now concerned with minimizing network traffic.

STEP 6 (Time T_5..n):

At this point, each node's mining activity can begin (your call on how this state is recognized...). The distributed mining process will entail the simulated generation of new Transactions (as you have been doing in previous labs) and the attempted mining of a new block (as you did in Lab 8). As each new mining node "discovers" a new transaction (periodic generation...your mileage may vary and you may feel free to tweak this, but start out with each node discovering a new transaction about every 3-5 seconds...try to make this random by alternatively sleeping within the bounds of 2-5 seconds--see python example below), it adds the newly-created transaction to its own memory pool and then broadcasts that transaction to all the other nodes on the network it knows about (by calling newTransactionBroadcast() on the other Full Nodes).

So for example, if Node C "discovers" (generates) a new transaction, it will broadcast that transaction to the other nodes using something like the following structure/class:

struct/class NewTransaction {
serialized transaction data
}

Each Full Node is listening for NewTransactionBroadcast() calls on your ephemeral PORT (or other port depending on your network solution, ie., RMI, WCF, etc.), and when a node receives a new transaction from the network, it checks first if the new transaction is already in it's memory pool of transactions, and if it's not, it will add the new transaction to its working memory pool of transactions it is adding to the block it is trying to mine. For this lab, we will assume that all transactions broadcast are valid transactions. The node then will propagate (gossip) the new transaction by calling NewTransactionBroadcast on all nodes in its list of known nodes. A given node will never broadcast a given transaction more than once.

The difficulty you will use in your mining for this lab will be 0x1e200000 which will average a successful new block generation with the correct nonce of between 1 and 17 seconds (μ ~ 7 seconds with one miner). 0x1e200000 will yield a hex target of 0x0000200000000000000000000000000000000000000000000000000000000000 (decimal: 220855883097298041197912187592864814478435487109452369765200775161577472).

Once a new block is mined by a Full Node, it will immediately broadcast this new block (with all of its transaction IDs) as a new structure or class:

struct/class NewBlockMined {
serialized block data of successful nonce (which will include Block and Header)
}

as: NewTransactionBroadcast(myNewBlockMined)

As other nodes receive the newly-mined block solution. For this lab, we will assume that all new blocks broadcast are valid. The node then will propagate (gossip) the new block by calling NewBlockBroadcast(newBlock) on all nodes in its list of known nodes. A given node will never broadcast a given block more than once to each known node. Upon receipt of a new block broadcast, each mining node will simply add the new block to its blockchain. Remember, each new Block is added on (bound to) to a specific prior block off of which the miner is working, which may be the genesis block in the case of the first mined block.

Once a miner publishes a newly-mined block, each miner (including the miner that produced the correct result) will sleep for a random number of seconds (as integers) between 0 and 3 seconds, i.e., in the range {0..3}. As an example, the python command print(random.randint(0,3)) will produce this result. This will reduce the chance that miners will all produce blocks at "about the same time", since we assume all Full Nodes are running on the same physical processor(s) (as all containers are likely running on the same physical machine) and they will tend (through the round-robin load-balancing of the OS scheduler) to produce results at about the same time. Adding this randomness will reduce this likelihood and produce more "clean winners", which should reduce the chance of forks, which we would like to avoid in this lab.

You may feel free to add further nodes if you wish, this is up to you if you're curious, but not required. A Full Node is only to connect to a MAX of two other nodes in your network.

FINAL STEP:

Run it and debug it.

You will likely discover "timing-related" or "order-related" issues as you begin to run and debug your distributed solution. You will need to think about issues, and how you will handle them, including:

What if I receive a transaction I already know about? (ignore it?)
What if I'm mining a block and I receive a new transaction? (add it and reset your nonce to 0?)
What data structure will I use to hold my "working set" of unspent transactions? (lists and maps are always a good choice)
How will I serialize my transactions for networked communication? (language-dependent)
How will I serialize my newly-mined blocks for networked communication? (network-framework dependent, ie, Java RMI, Windows C# WCF, python sockets, etc.)
How do you handle multiple methods/functions listening on a single port (e.g., 58333)? Or do you establish each listener on a separate port (i.e., use multiple ports)? This may depend on the method of communication you choose (RMI, gRPC, WCF, sockets, etc.). You are completely free to make this decision as you see fit...whatever is easier.
How will you handle forks which might occur? You don't have to worry about handling forks unless you are encountering them.
How do you govern the "gossiping" between the nodes...that is to say, how do you make sure that transactions and blocks are not re-broadcast over and over again?

To prepare for submission, create a tarball of your DNS_SEED container and one of your Full Nodes for Problem 3:

Commit a new image of both using a command similar to (use of course the actual names you've used for this problem below):

docker commit DNS_SEED dns_seed:lab7
docker commit FULL_NODE full_node:lab7

Then, save the images to a tarball:

$ docker save -o DNSSeed.tar dns_seed:lab7
$ docker save -o FullNode.tar full_node:lab7

Then, zip them up:

$ bzip2 DNSSeed.tar
[be patient...may take a few secs]
$ bzip2 FullNode.tar
[be patient...may take a few secs]

Now scp your two bzip'd files to your [userid] directory under:

y /stage/MPCS56600/

Submit source code that you created for Problem 3 and a README as usual into your lab7 repo directory as described below.

References:

You may find the following references helpful (in addition to the links from previous labs):

General gRPC Tutorial Links

java [https://grpc.io/docs/tutorials/basic/java.html]
C++ [https://grpc.io/docs/tutorials/basic/c.html#generating-client-and-server-code]
go [https://grpc.io/docs/quickstart/go.html#install-protocol-buffers-v3]
python [https://grpc.io/docs/tutorials/basic/python.html]

gRPC Guides: https://grpc.io/docs/guides/
gRPC Tutorials: https://grpc.io/docs/tutorials/
C++: https://grpc.io/docs/quickstart/cpp.html
Java: https://grpc.io/docs/quickstart/java.html
Go: https://grpc.io/docs/quickstart/go.html
Python: https://grpc.io/docs/quickstart/python.html

Java RMI:

Java RMI and Object Serialization FAQ
Tutorial's Point RMI Tutorial
Oracle's RMI Getting Started Guide
JavaTpoint's RMI Guide

Windows WCF:

WCF Tutorial
Microsoft Tutorial
Code Project's Beginner's Guide to WCF

Python-related RPC mechanisms:

Python Pickle:
Python Docs Pickle
Python UsingPickle
Python Pickle DataCamp

Python JSON-RPC:
Python PyPi JSON-RPC
Python JSON-RPC Github example (not tested...)
Python JSON-RPC on jsonrpc.org
Another example

Other Examples of RPC mechanisms:

Go:
Go example
Medium's article

Haskell:
The World's Dumbest RPC Example in Haskell (untested)
Hackage example of JSON-RPC in Haskell

General Docker Tutorial Links

Docker Cheat Sheet
Learn Docker in 12 Minutes Youtube
Demystifying Docker Youtube
TutorialsPoint: Docker Tutorial for Absolute Beginners
Docker Overview
Ubuntu package commands

Submission:

For Labs:

Each student should submit their LabN files and any supporting materials to the repo. Please include a README text file that contains any instructions or additional details for the TAs to assist with grading.

Setup:

When you open the invitation URL in a browser tab, you will have to complete the following steps:

Github may ask you to grant permission for GitHub Classroom to have access to your GitHub account. If you are asked this question, you should say yes and grant access to Github Classroom.

You must click “Accept this assignment” or your repository will not actually be created. Do not skip this step!

If you run into any issues, please ask for help on Ed.