“Paxos made Simple” (2001)
Paper link from author’s website: https://lamport.azurewebsites.net/pubs/paxos-simple.pdf
Summary: This paper is a prosaic description of the Paxos algorithm with the aim of making it easier to understand compared to its original publication in “The Part-Time Parliament” (Lamport, 1998). Paxos is one of the first consensus algorithms with a formal proof of safety. To date, many Paxos variations which follow the principles of the base algorithm exist. They are optimized for various use-cases and settings, for example RDMA-based networks. Paxos is also widely used in production systems.
“Paxos Made Live – An Engineering Perspective” (2007)
DBLP: https://dblp.uni-trier.de/rec/html/conf/podc/ChandraGR07
Summary: This paper is an experience report on replacing the internal existing replication layer of Chubby – a distributed lock service – with a Paxos-based implementation. It highlights many pitfalls and challenges when attempting to use Paxos in a practical system, for example handling disk corruption, group membership changes, snapshots and command log management.
“APUS: Fast and Scalable Paxos on RDMA” (2017)
DBLP: https://dblp.uni-trier.de/rec/html/conf/cloud/WangJCYC17
Summary: The authors describe a runtime system that uses LD_PRELOAD to intercept the inbound socket calls of unreplicated server programs. The intercepted data is replicated using an atomic broadcast layer which implements a Paxos-based protocol based on RDMA. This makes it possible to replicate a program to provide fault-tolerance without code modifications of the application itself.