Provenance-based Data Skipping (TechReport)
Database systems analyze queries to determine upfront which data is needed for answering them and use indexes and other physical design techniques to speed-up access to that data. However, for important classes of queries, e.g., HAVING and top-k queries, it is impossible to determine up-front what data is relevant. To overcome this limitation, we develop provenance-based data skipping (PBDS), a novel approach that generates provenance sketches to concisely encode what data is relevant for a query. Once a provenance sketch has been captured it is used to speed up subsequent queries. PBDS can exploit physical design artifacts such as indexes and zone maps. Our approach significantly improves performance for both disk-based and main-memory database systems.
READ FULL TEXT