Scalable Data Management in the Presence of High-Speed Networks (BI2011/1)

PI: Carsten Binnig (TU Darmstadt)
Project Collaborator: Tobias Ziegler
Project website: Scalable Data Management in the Presence of High-Speed Networks

This project will study the impact of the next generation of high-speed network technologies on scalable data management. Modern data management systems store their data in main memory for fast access. For these systems, cross-machine communication is a major bottleneck that needs to be avoided as much as possible. As a result existing database system are designed to avoid communication from the ground up. Yet, with high-speed Remote Direct Memory Access (RDMA) capable networks, the fundamental assumption of all these techniques is no longer true. InfiniBand FDR 4x has a bandwidth that is in the same ballpark as the bandwidth of one memory channel, and it increases even more with the most recent EDR 4x standard. Furthermore, with the advances in RDMA, the possibility to directly access the memory of a remote machine without involving its CPU, communication latencies will continue to decrease. Consequently, this hardware trend creates an inflection point for new distributed database architecture.

This proposal explores the foundations of scalable data management over high-speed RDMA-capable networks, which is one of the key aspects of the priority program 2037. In this proposal we will study the following directions:

  1. First, we want to analyze the fundamental properties of modern RDMA- capable networks to enable truly scalable distributed database systems. While existing work has studied the effect of individual aspects (e.g., one- sided vs. two-sided) in rather small cluster, we aim to holistically study the complete design space to achieve scalability.
  2. Second, based on these findings, we will build easy-to-use abstractions for storage and computation that enable scalability. The goal is that these abstractions support a wide range of different workloads ranging from traditional workloads (OLAP and OLTP) to more complex workloads (e.g., machine learning).
  3. Finally, we will implement different applications to show the scalability for a wide-set workloads / applications when using these abstractions. Special emphasis will be placed on experimental evaluation on large-scale deployments with up to 100 nodes using both synthetic benchmarks and real world applications.

The main result of this proposal will be an easy-to-use abstraction that allows system-builders to implement their own scalable data management system on top. Most importantly, we will make the results of this proposal available to all other SPP projects. While we do not expect that any of our prototype systems for the different workloads will be a feature complete database system, it will foster technology transfer and provide a platform for other researchers in the priority program to explore the impact of high-speed networks. Also, given that the high-speed networks are quickly decreasing in price, we see them as common in the next few years. We are well positioned to seed this new area of research, and therefore to lead in the design of next-generation database systems.