Week 3

Week 4


You can read O’Reilly books for free with a Harvard login at this web site.

Python for Data AnalysisPython for Data Analysis, O’Reilly Media - “Python for Data Analysis is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. It is also a practical, modern introduction to scientific computing in Python, tailored for data-intensive applications. This is a book about the parts of the Python language and libraries you’ll need to effectively solve a broad set of data analysis problems. This book is not an exposition on analytical methods using Python as the implementation language.”

image alt textMachine Learning for Hackers, O’Reilly Media - “If you’re an experienced programmer interested in crunching data, this book will get you started with machine learning—a toolkit of algorithms that enables computers to train themselves to automate useful tasks. Authors Drew Conway and John Myles White help you understand machine learning and statistics tools through a series of hands-on case studies, instead of a traditional math-heavy presentation.”

A translation of the R examples in Machine Learning for Hackers to Python can be found here: http://slendrmeans.wordpress.com/will-it-python/

DSB Data Science For Business

Written by renowned data science experts Foster Provost and Tom Fawcett, Data Science for Business introduces the fundamental principles of data science, and walks you through the “data-analytic thinking” necessary for extracting useful knowledge and business value from the data you collect. This guide also helps you understand the many data-mining techniques in use today.

Based on an MBA course Provost has taught at New York University over the past ten years,Data Science for Business provides examples of real-world business problems to illustrate these principles. You’ll not only learn how to improve communication between business stakeholders and data scientists, but also how participate intelligently in your company’s data science projects. You’ll also discover how to think data-analytically, and fully appreciate how data science methods can support business decision-making.

image alt textProbabilistic Programming and Bayesian Methods for Hackers - “The Bayesian method is the natural approach to inference, yet it is hidden from readers behind chapters of slow, mathematical analysis. The typical text on Bayesian inference involves two to three chapters on probability theory, then enters what Bayesian inference is. Unfortunately, due to mathematical intractability of most Bayesian models, the reader is only shown simple, artificial examples. This can leave the user with a so-what feeling about Bayesian inference. In fact, this was the author’s own prior opinion.”

Basic Data Science Motivation and Introduction

[1] BBC Documentary: The Age of Big Data (58 mins)

[2] Data Science Workflow: Overview and Challenges by Philip Guo

[3] Enterprise Data Analysis and Visualization: An Interview Study, Sean Kandel, Andreas Paepcke, Joseph Hellerstein, Jeffrey Heer, IEEE Visual Analytics Science & Technology (VAST), 2012

[4] That’s Funny…, Howard Wainer and Shaun Lysen, American Scientist, 2009


[1] matplotlib - 2D and 3D plotting in Python, J.R. Johannson, .ipynb

[2] A Gallery of Statistical Graphs in Matplotlib (Matplotlib Defaults), C. Beaumont, .ipynb

[3] A Gallery of Statistical Graphs in Matplotlib, C. Beaumont, .ipynb

[4] Wrangler: Interactive Visual Specification of Data Transformation Scripts, Sean Kandel, Andreas Paepcke, Joseph Hellerstein, Jeffrey Heer, ACM Human Factors in Computing Systems (CHI), 2011 [Data Wrangler tool web site]

Storytelling and presentations

[1]Narrative Visualization: Telling Stories with Data, Edward Segel, Jeffrey Heer, IEEE Trans. Visualization & Comp. Graphics (Proc. InfoVis), 2010

[2] When do stories work? Evidence and illustration in the social sciences, A. Gelman and T. Basboll, 2013

[3] Storytelling, M. Krzywinski & A. Cairo, Nature Methods, 2013 (Rebuttal by Y. Katz, Editorial, Response)

[4] Presentation Zen Tips, Garr Reynolds

[5] Tips for Giving Clear Talks, Kayvon Fatahalian

Data acquisition and cleanup

[1] Web scraping demo C. Beaumont, .ipynb

[2] Data Wrangling Demo C. Beaumont, .ipynb


[1] PCA Tutorial, J. Shlens, Princeton University

[2] Principal Components: Mathematics, Example, Interpretation, Cosma Shalizi, CMU

Machine Learning

[1] Chapter 1 of Machine learning, a Probabilistic Perspective

[2] Cross Validation: The Right and Wrong Way