Sean Owen

Cloudera

Talk Abstract: What “50 Years of Data Science” leaves out

We’re told data science is the key to unlocking the value in big data, but nobody seems to agree just what it is. Is it engineering, statistics. . .both? David Donoho’s “50 Years of Data Science”, which is itself a survey of Tukey’s “Future of Data Analysis”, offers one of the best criticisms of the hype around data science from a statistics perspective, arguing that data science is not new (if it’s anything at all) and calling statistics to action (again) to take back the field with a more practical, modern view of what it means to teach statistics and data science.

Drawing on his blog post, Sean Owen responds, offering counterpoints from an engineer, in search of a better understanding of how to teach and practice data science in 2017. Sean explores some key points in the history of data science from the past 50 years in order to build up a more complete view of how data science sprung out of statistics and merged with computer engineering and concludes by comparing Donoho’s view of what it means to build data science capability with one taken from the experience organizations doing so in the context of Apache Hadoop, Spark, and other big data tools.

Bio: Sean is Director of Data Science at Cloudera in London. Before Cloudera, he founded Myrrix Ltd. (now the Oryx project) to commercialise large-scale real-time recommender systems on Hadoop. He is an Apache Spark committer and a co-author of O’Reilly Media’s Advanced Analytics with Spark. He was a committer and VP for Apache Mahout, and co-author of Mahout in Action. Previously, Sean was a senior engineer at Google. He holds an MBA from London Business School and a BA from Harvard University.