Harvard's CS109A course is an introductory course in data science, designed for students with some prior programming experience. The course covers a broad range of topics in data science, including data cleaning, visualization, analysis, and machine learning. Students learn to use tools such as Python to work with data and build statistical models. The course also includes a significant group project component, where students work in teams to apply their skills to a real-world data science problem. Overall, the course aims to provide students with a solid foundation in data science and prepare them for further study in the field.
CS109B is a more advanced course in data science at Harvard, focused on topics in deep learning and probabilistic machine learning. The course covers techniques for building and training deep neural networks, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), as well as more recent developments in the field such as transformers and language models like BERT and GPT, GANs, Style Gans. Additionally, the course explores unsupervised learning techniques, such as clustering, and covers probabilistic machine learning in a Bayesian framework. Overall, the course is designed to provide students with a deep understanding of advanced techniques in data science and prepare them for research or careers in the field.
From 2013 to 2015, it was taught as a one-semester course by Hanspeter Pfister and Joe Blitzstein. In 2016 the course expanded to two semesters, CS109A and CS109B, and has been taught by Pavlos Protopapas from SEAS and Natesh Pillai, Kevin Rader, and Mark Glickman from Statistics.
To view the individual previous versions of the course, click on the diagram below.
CS109B is a more advanced course in data science at Harvard, focused on topics in deep learning and probabilistic machine learning. The course covers techniques for building and training deep neural networks, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), as well as more recent developments in the field such as transformers and language models like BERT and GPT, GANs, Style Gans. Additionally, the course explores unsupervised learning techniques, such as clustering, and covers probabilistic machine learning in a Bayesian framework. Overall, the course is designed to provide students with a deep understanding of advanced techniques in data science and prepare them for research or careers in the field.
From 2013 to 2015, it was taught as a one-semester course by Hanspeter Pfister and Joe Blitzstein. In 2016 the course expanded to two semesters, CS109A and CS109B, and has been taught by Pavlos Protopapas from SEAS and Natesh Pillai, Kevin Rader, and Mark Glickman from Statistics.
To view the individual previous versions of the course, click on the diagram below.
How to Take this Course
Even if you are not a student at Harvard, you are welcome to “follow” this course for free by working through the course material that are publicly available here.
- If interested in a verified certificate from edX, enroll here instead.
- If interested in transfer credit and accreditation from Harvard Extension School, instead.