Expand description
Code to spin up communication mesh for a Timely cluster.
The heart of this code is create_sockets
, which establishes connections with the other
processes (“peers”) in the Timely cluster. This process needs to be fault-tolerant: If one or
multiple processes restart while connections are established, this must not leave the cluster
in a stalled state where the processes cannot finish setting up connections for whatever
reason.
A Timely cluster assumes reliable networking among all processes in the cluster and forces its
processes to crash if this condition is not met. It is therefore impossible to set up a working
Timely cluster in the presence of persistent process or network failures. However, we assume
that any period of instability eventually resolves. create_sockets
is written to ensure
that once things are stable again, processes can correctly establish connections among each
other.
If a process returns from create_sockets
with one or more sockets that are connected to
processes that have crashed, that process will also crash upon initializing its side of the
Timely cluster. We can say that processes connected to crashed processes are “doomed”.
Additionally, all processes connected to doomed processes are doomed themselves, as their
doomed peer will eventually crash, forcing them to crash too. We need to avoid a circle of doom
where new processes perpetually connect to doomed processes, become doomed themselves, doom
other processes that connect to them, and then crash.
create_sockets
avoids the circle of doom by ensuring that a new generation of processes
does not connect to the previous generation. We pessimistically assume that the entire previous
generation has been doomed and to successfully connect we need to spin up an entire new
generation. This approach can cause us to force restarts of non-doomed processes and therefore
leaves some efficiency on the floor, but we are more concerned about our ability to reason
about the system than about cluster startup time.
To differentiate between generations, we rely on an epoch, i.e., a number that increases between process restarts. Unfortunately, we don’t have a way to get a perfect epoch here, so we use the system time instead. Note that the system time is not guaranteed to always increase, but as long as it increases eventually, we will eventually succeed in establishing connections between processes.
Each process performs the following protocol:
- Let
my_index
be the index of the process in the Timely cluster. - If
my_index
== 0, mint a newmy_epoch
. Otherwise leavemy_epoch
uninitialized. - For each
index
<my_index
:- Connect to the peer at
index
. - Receive
peer_epoch
. - If
my_epoch
is unitialized, setmy_epoch
topeer_epoch
. - Send
my_epoch
. - Compare epochs:
my_epoch
<peer_epoch
: fail the protocolmy_epoch
>peer_epoch
: retry the connectionmy_epoch
==peer_epoch
: connection successfully established
- Connect to the peer at
- Until a connections has been established with all peers:
- Accept a connection from a peer at
index
>my_index
. - If a connection to a peer at
index
was already established, fail the protocol. - Send
my_epoch
. - Receive
peer_epoch
. - Compare epochs and react as above.
- Accept a connection from a peer at
Process 0 acts as the leader of a generation. When a process connects to process 0 and learns its epoch, it becomes part of that generation and cannot connect to processes of other generations anymore. When a process crashes that was previously part of a generation, it dooms that generation. When it restarts, it can’t connect to the same generation anymore because process 0 won’t accept the connection. What’s more, in attempting to connect to the processes of the doomed generation, the new process forces them to fail the protocol and rejoin as part of the new generation, ensuring we don’t get stuck with multiple processes on different generations indefinitely.
Structs§
- Epoch 🔒
- Epoch type used in the
create_sockets
protocol.
Enums§
- Create
Sockets 🔒Error - Errors returned by
create_sockets
.
Functions§
- accept_
higher 🔒 - Accept connections from peers with indexes greater than
my_index
. - connect_
lower 🔒 - Connect to peers with indexes less than
my_index
. - create_
sockets 🔒 - Creates socket connections from a list of host addresses.
- initialize_
networking - Creates communication mesh from cluster config
- initialize_
networking_ 🔒inner - timeout 🔒
- Helper for performing network I/O under a timeout.