Elements and Principles of Data Analysis
The data revolution has led to an increased interest in the practice of data analysis. As a result, there has been a proliferation of "data science" training programs. Because data science has been previously defined as an intersection of already-established fields or union of emerging technologies, the following problems arise: (1) There is little agreement about what is data science; (2) Data science becomes secondary to established fields in a university setting; and (3) It is difficult to have discussions on what it means to learn about data science, to teach data science courses and to be a data scientist. To address these problems, we propose to define the field from first principles based on the activities of people who analyze data with a language and taxonomy for describing a data analysis in a manner spanning disciplines. Here, we describe the elements and principles of data analysis. This leads to two insights: it suggests a formal mechanism to evaluate data analyses based on objective characteristics, and it provides a framework to teach students how to build data analyses. We argue that the elements and principles of data analysis lay the foundational framework for a more general theory of data science.
READ FULL TEXT