Foresight: Rapid Data Exploration Through Guideposts
Current tools for exploratory data analysis (EDA) require users to manually select data attributes, statistical computations and visual encodings. This can be daunting for large-scale, complex data. We introduce Foresight, a visualization recommender system that helps the user rapidly explore large high-dimensional datasets through "guideposts." A guidepost is a visualization corresponding to a pronounced instance of a statistical descriptor of the underlying data, such as a strong linear correlation between two attributes, high skewness or concentration about the mean of a single attribute, or a strong clustering of values. For each descriptor, Foresight initially presents visualizations of the "strongest" instances, based on an appropriate ranking metric. Given these initial guideposts, the user can then look at "nearby" guideposts by issuing "guidepost queries" containing constraints on metric type, metric strength, data attributes, and data values. Thus, the user can directly explore the network of guideposts, rather than the overwhelming space of data attributes and visual encodings. Foresight also provides for each descriptor a global visualization of ranking-metric values to both help orient the user and ensure a thorough exploration process. Foresight facilitates interactive exploration of large datasets using fast, approximate sketching to compute ranking metrics. We also contribute insights on EDA practices of data scientists, summarizing results from an interview study we conducted to inform the design of Foresight.
READ FULL TEXT