Pushing Bytes: Cloud-Scale Data Replication with RDMC
MetadataShow full item record
Behrens, Jonathan; Jha, Sagar; Tremel, Edward; Birman, Ken
Data center infrastructures frequently replicate objects to create backups or to copy executables and input files to compute nodes. This task occurs under time pressure: data is at risk of loss until replicated for fault-tolerance, and in the case of parallel processing systems like Spark, useful computation can't start until the nodes all have a copy of the executable images. Cloud elasticity creates a similar need to rapidly copy executables and their inputs. To address these needs, we introduce RDMC: a fast reliable data replication protocol that implements multicast as a pattern of RDMA unicast operations, which maximizes concurrency while minimizing unnecessary transfers. RDMC can be used in any setting that has RDMA or a software RDMA emulation. Our focus is on use of replication as an element of the data center infrastructure. We evaluate overheads for the hardware-supported case using microbenchmarks and heavy-load experiments, and also describe preliminary experiments using a technique that offloads the entire data transfer pattern into the NICs, further reducing latency while freeing server resources for other tasks.