Consensus and Replication in Elasticsearch

Consensus algorithms are foundational to distributed systems. Choosing among Paxos and its many variants ultimately determines the performance and fault-tolerance of the underlying system. Boaz, Jason, and Yannick will discuss the basic mechanics of quorum-based consensus algorithms as well as their tradeoffs compared to the primary-backup approach – both of which are used by Elasticsearch. They will show how these two layers work together to facilitate cluster state changes and data replication while guaranteeing speed and safety. They will finish with a deep dive into the data replication layer and the recent addition of sequence numbers, which are the foundation of faster operation-based recoveries and cross-data-center replication.

Boaz Leskes

Boaz is a core Elasticsearch developer. When not working on consensus algorithms, cluster state changes, data replication, and sequence numbers, you can find him at the ping pong table, playing office DJ, collaborating with colleagues on Zoom, or, if it's Friday, eating hummus.

Jason Tedor

Jason Tedor is a Software Engineer for Elastic. Jason is an Elasticsearch core developer with a love for all things distributed. In roles prior to joining Elastic, Jason was a backend engineer using the Hadoop ecosystem to handle one of the largest clinical datasets in the world, and built Monte Carlo simulations to model commercial loan portfolios.

Yannick Welsch

Yannick Welsch is an Elasticsearch engineer based in Luxembourg. He works on the distributed bits of Elasticsearch, applying his experience in running large clusters as well as making use of his background in formal specification languages.