11.1 Introduction

Data is not information, information is not knowledge, knowledge is not understanding, understanding is not wisdom. [C. Stoll (attributed), Nothing to Hide: Privacy in the 21st Century, 2006]

One of the challenges of working in the data science (DS), machine learning (ML) and artificial intelligence (AI) fields is that nearly all quantitative work can be described with some combination of the terms DS/ML/AI (often to a ridiculous extent).

Robinson [135] suggests that their relationships follow an inclusive hierarchical structure:

  • in a first stage, DS provides “insights” via visualization and (manual) inferential analysis;

  • in a second stage, ML yields “predictions” (or “advice”), while reducing the operator’s analytical, inferential and decisional workload (although it is still present to some extent), and

  • in the final stage, AI removes the need for oversight, allowing for automatic “actions” to be taken by a completely unattended system.

The goals of artificial intelligence are laudable in an academic setting, but in practice, we believe that stakeholders should not seek to abdicate all of their agency in the decision-making process; as such, we follow the lead of various thinkers and suggest further splitting AI into “general AI” (which we will not be pursuing) and “augmented intelligence” (which can be viewed as ML “on steroids”).

With this in mind, our definition of the DS/ML/AI approach is that it consists of quantitative processes (what H. Mason has called “the working intersection of statistics, engineering, computer science, domain expertise, and”hacking” [136]) that can help users learn actionable insights about their situation without completely abdicating their decision-making responsibility.

In this module, we will take a brief look at:


D. Woods, Bitly’s Hilary Mason on "what is a data scientist?",” Forbes, Mar. 2012.