2015 - CS109 Data Science

2015 CS109A: Harvard's Data Science

Hubway Clustering

Learning from data in order to gain useful predictions and insights. This third iteration of the course continues on the same ideas as the previous two; use methods of the five key facets of an investigation: data wrangling, cleaning, and sampling to get a suitable data set; data management to be able to access big data quickly and reliably; exploratory data analysis to generate hypotheses and intuition; prediction based on statistical methods such as regression and classification; and communication of results through visualization, stories, and interpretable summaries.

Python was used for all programming assignments and projects. All lectures are posted here.

Instructors

Joe Blitzstein, Statistics
Hanspeter Pfister, Computer Science
Verena Kaynig-Fittkau, Computer Science

Staff

Rahul Dave, Head TF