mz_cluster

Module communication

Source
Expand description

Code to spin up communication mesh for a cluster replica.

The startup protocol is as follows: The controller in environmentd, after having connected to all the clusterd processes in a replica, sends each of them a CreateTimely command containing an epoch value (which is the same across all copies of the command). The meaning of this value is irrelevant, as long as it is totally ordered and increases monotonically (including across environmentd restarts)

In the past, weโ€™ve seen issues caused by environmentdโ€™s replica connections flapping repeatedly and causing several instances of the startup code to spin up in short succession (or even simultaneously) in response to different CreateTimely commands, causing mass confusion among the processes and possible crash loops. To avoid this, we do not allow processes to connect to each other unless they are responding to a CreateTimely command with the same epoch value. If a process discovers the existence of a peer with a lower epoch value, it ignores it, and if it discovers one with a higher epoch value, it aborts the connection. Such a process is guaranteed to eventually hear about the higher epoch value (and, thus, successfully connect to its peers), since environmentd sends CreateTimely commands to all processes in a replica.

Concretely, each process awaits connections from its peers with higher indices, and initiates connections to those with lower indices. Having established a TCP connection, they exchange epochs, to enable the logic described above.

Structsยง

  • This task can never successfully boot, since a peer has seen a higher epoch from environmentd.

Functionsยง