Module communication

Source
Expand description

Code to spin up communication mesh for a Timely cluster.

The heart of this code is create_sockets, which establishes connections with the other processes (“peers”) in the Timely cluster. This process needs to be fault-tolerant: If one or multiple processes restart while connections are established, this must not leave the cluster in a stalled state where the processes cannot finish setting up connections for whatever reason.

A Timely cluster assumes reliable networking among all processes in the cluster and forces its processes to crash if this condition is not met. It is therefore impossible to set up a working Timely cluster in the presence of persistent process or network failures. However, we assume that any period of instability eventually resolves. create_sockets is written to ensure that once things are stable again, processes can correctly establish connections among each other.

If a process returns from create_sockets with one or more sockets that are connected to processes that have crashed, that process will also crash upon initializing its side of the Timely cluster. We can say that processes connected to crashed processes are “doomed”. Additionally, all processes connected to doomed processes are doomed themselves, as their doomed peer will eventually crash, forcing them to crash too. We need to avoid a circle of doom where new processes perpetually connect to doomed processes, become doomed themselves, doom other processes that connect to them, and then crash.

create_sockets avoids the circle of doom by ensuring that a new generation of processes does not connect to the previous generation. We pessimistically assume that the entire previous generation has been doomed and to successfully connect we need to spin up an entire new generation. This approach can cause us to force restarts of non-doomed processes and therefore leaves some efficiency on the floor, but we are more concerned about our ability to reason about the system than about cluster startup time.

To differentiate between generations, we rely on an epoch, i.e., a number that increases between process restarts. Unfortunately, we don’t have a way to get a perfect epoch here, so we use the system time instead. Note that the system time is not guaranteed to always increase, but as long as it increases eventually, we will eventually succeed in establishing connections between processes.

Each process performs the following protocol:

  • Let my_index be the index of the process in the Timely cluster.
  • If my_index == 0, mint a new my_epoch. Otherwise leave my_epoch uninitialized.
  • For each index < my_index:
    • Connect to the peer at index.
    • Receive peer_epoch.
    • If my_epoch is unitialized, set my_epoch to peer_epoch.
    • Send my_epoch.
    • Compare epochs:
      • my_epoch < peer_epoch: fail the protocol
      • my_epoch > peer_epoch: retry the connection
      • my_epoch == peer_epoch: connection successfully established
  • Until a connections has been established with all peers:
    • Accept a connection from a peer at index > my_index.
    • If a connection to a peer at index was already established, fail the protocol.
    • Send my_epoch.
    • Receive peer_epoch.
    • Compare epochs and react as above.

Process 0 acts as the leader of a generation. When a process connects to process 0 and learns its epoch, it becomes part of that generation and cannot connect to processes of other generations anymore. When a process crashes that was previously part of a generation, it dooms that generation. When it restarts, it can’t connect to the same generation anymore because process 0 won’t accept the connection. What’s more, in attempting to connect to the processes of the doomed generation, the new process forces them to fail the protocol and rejoin as part of the new generation, ensuring we don’t get stuck with multiple processes on different generations indefinitely.

Structs§

Epoch 🔒
Epoch type used in the create_sockets protocol.

Enums§

CreateSocketsError 🔒
Errors returned by create_sockets.

Functions§

accept_higher 🔒
Accept connections from peers with indexes greater than my_index.
connect_lower 🔒
Connect to peers with indexes less than my_index.
create_sockets 🔒
Creates socket connections from a list of host addresses.
initialize_networking
Creates communication mesh from cluster config
initialize_networking_inner 🔒
timeout 🔒
Helper for performing network I/O under a timeout.