Comprehensive and Comprehensible Data Catalogs: The What, Who, Where, When, Why, and How of Metadata Management
Scalable data science requires access to metadata, which is increasingly managed by databases called data catalogs. With today's data catalogs, users choose between designs that make it easy to store or retrieve metadata, but not both. We find this problem arises because catalogs lack an easy to understand mental model. In this paper, we present a new catalog mental model called 5W1H+R. The new mental model is comprehensive in the metadata it represents, and comprehensible in that it permits users to locate metadata easily. We demonstrate these properties via a user study. We then study different schema designs for the new mental model implementation and evaluate them on different backends to understand their relative merits. We conclude mental models are important to make data catalogs more useful and to boost metadata management efforts that are crucial for data science tasks.
READ FULL TEXT