Bibliography – Cost Models & Performance Analysis

“A Holistic View of Stream Partitioning Costs” (2017)

DBLP: https://dblp.org/rec/html/journals/pvldb/KatsipoulakisLC17

Summary: This paper proposes a new model for stream query partitioning. They combine aggregation cost and load imbalance for partitioning of stateful operations, providing a formula weighting both numbers against each other. Their solution performs best when the number of partitions is high, which becomes especially interesting regarding manycores.

“Predicting query execution time: Are optimizer cost models really unusable?” (2013)

DBLP: https://dblp.org/rec/html/conf/icde/WuCZTHN13

Summary: In this paper, the author tries to estimate the cardinality and thus the cost of execution of a query plan only after the optimization but before execution. The cardinality estimates are performed on a single query plan which was chosen by the optimizer instead of multiple plans. Calibrate the cost using offline profiling method. Based on that a single query is selected and to refine the cardinality, sampling method is used. Perhaps the most interesting aspect of this work is the basic question it raises: should query running time prediction treat the DBMS as a black box (the machine learning approach), or should we exploit the fact that we actually know exactly what is going on inside the box (the optimizer based approach)?

“A Common Runtime for High Performance Data Analysis” (2017)

DBLP: https://dblp.org/rec/conf/cidr/PalkarTSSAZ17

Summary: The paper provides an efficient solution to combine multiple efficient libraries together without loosing efficiency in forwarding results. The paper provides an Intermediate Representation(IR) – to describe data manipulation operations. As the IR has to be compiled into an efficient source code, they also provide a backend compiler and finally a runtime for efficiently handling execution. They have also evaluated their solution on different usecases (DBMS workloads, data cleaning etc.). Finally, they also show that their solution is able to have 30x speed up by combining multiple frameworks.