Data science education lacks a much-needed focus on ethics
Undergraduate training for data scientists – dubbed the sexiest job of the 21st century by Harvard Business Review – falls short in preparing students for the ethical use of data science, our new study found. Data science lies at the nexus of statistics and computer science applied to a particular field such as astronomy, linguistics, medicine, psychology or sociology.
The idea behind this data crunching is to use big data to address otherwise unsolvable problems, such as how health care providers can create personalised medicine based on a patient’s genes and how businesses can make purchase predictions based on customers’ behavior.
The US Bureau of Labor Statistics projects a 15% growth in data science careers over the period of 2019-2029, corresponding with an increased demand for training in the field. Universities and colleges have responded to the demand by creating new programmes or revamping existing ones. The number of undergraduate programmes in the field in the US jumped from 13 in 2014 to at least 50 as of September 2020.
In our study, we compared undergraduate data science curricula with the expectations for undergraduate data science training put forth by the National Academies of Sciences, Engineering and Medicine. Those expectations include training in ethics. We found most programmes dedicated considerable coursework to mathematics, statistics and computer science, but little training in ethical considerations such as privacy and systemic bias. Only 50% of the degree programs we investigated required any coursework in ethics.
Why it matters
As with any powerful tool, the responsible application of the field requires training in how to use data science and to understand its impacts. Our results align with prior work that found little attention is paid to ethics in degree programmes. This suggests that these undergraduate degree programmes may produce a workforce without the training and judgment to apply data science methods responsibly.
It isn’t hard to find examples of irresponsible use of data science. For instance, policing models that have a built-in data bias can lead to an elevated police presence in historically over-policed neighborhoods. In another example, algorithms used by the US health care system are biased in a way that causes Black patients to receive less care than white patients with similar needs.
We believe explicit training in ethical practices would better prepare a socially responsible workforce.
What still isn’t known
While data science is a relatively new field – still being defined as a discipline – guidelines exist for training undergraduate students. These guidelines prompt the question: How much training can we expect in an undergraduate degree?
The National Academies recommend training in 10 areas, including ethical problem solving, communication and data management.
Our work focused on undergraduate degrees at schools classified as R1, meaning they engage in high levels of research activity. Further research could examine the amount of training and preparation in various aspects of data science at the Masters and PhD levels and the nature of undergraduate training at schools of different research levels.
Given that many data science programmes are new, there is considerable opportunity to compare the training that students receive with the expectations of employers.
We plan to expand on our findings by investigating the pressures that might be driving curriculum development for degrees in other disciplines that are seeing similar job market growth.