CS109A: Harvard’s Data Science

2014 CS109A: Harvard's Data Science

Learning from data in order to gain useful predictions and insights. This second iteration of the course continues to use methods for five key facets of an investigation: data wrangling, cleaning, and sampling to get a suitable data set; data management to be able to access big data quickly and reliably; exploratory data analysis to generate hypotheses and intuition; prediction based on statistical methods such as regression and classification; and communication of results through visualization, stories, and interpretable summaries.

Python was used for all programming assignments and projects. All lectures are be posted here.

Instructors

Rafael Irizarry, Biostatistics
Verena Kaynig-Fittkau, Computer Science

Staff

Stephanie Hicks

Material from CS 109 taught
Please find all material linked on this webpage.