ReProVide: Query Optimisation and Near-Data Processing on Reconfigurable SoCs for Big Data Analysis (ME 943/9, TE163/21, WI4415/1)

PI: Klaus Meyer-Wegener, Jürgen Teich, Stefan Wildermann (Uni Erlangen-Nuremberg)
Project Collaborator: Andreas Becher, Lekshmi B.G.
Project website: ReProVide
Demo video

The new Priority Programme 2037 considers the challenge of the exponential growth of volume, velocity, and variety of data produced every day. As a consequence, any means and effort for parallel processing and exploitation of heterogeneity of emerging computer architectures is needed in order to be able to analyse Petabytes of data in the shortest possible amount of time. Yet, this not only requires to minimise the time for processing and filtering huge amounts of data, but also novel solutions are needed of how to organise the memory layout and how to avoid any bottlenecks of data transport between memory and processing units.

In this project, we propose, analyse and design a novel FPGA-based System-on-Chip (SoC) architecture called ReProVide (Reconfigurable Data ProVider) for near-data processing of big data sources. This platform serves as an intelligent storage system and at the same time reconfigurable data (pre-)processing interface between diverse data sources and host systems requesting data from these sources, thus reducing network bandwidth, host workload, and saving power to a great extent.

The uniqueness of our approach lies in providing query-specific accelerator datapaths and filter functions on-demand and how to systematically map these at run time to the ReProVide platform by exploiting the fact that the hardware of an FPGA may be dynamically reconfigured. We will show that optimisation and reconfiguration times can be greatly neglected with increasing volume of data to be processed. Through reconfiguration, ReProVide will be able to provide customised acceleration for processing a variety of existing and emerging database formats (e.g., column- or document-oriented schemata in NoSQL) as well as data streams, e.g., stemming from click stream analysis, as well as sensors in the domains of Internet of Things and Industrie 4.0. Moreover, on-the-fly data re-formatting functions (schema-on-read) are easily supported.

Apart from the development of the SoC platform as well as a set of basic operators from which accelerator modules typically encountered in Big Data analytics may be assembled, ReProVide requires novel hardware/software co-design techniques for achieving customised data processing with minimal execution time. In particular, new cost models as well as hierarchical query optimisation techniques will be investigated where global optimiser and the architecture-specific local optimiser work together to enable a scalable query processing.

The platform and our methodology will be tested using scenarios and benchmarks as well as applications as developed by other projects within SPP2037.