For your final assignment in this course you will work on a month-long data science project. The goal of the project is to go through the complete data science process to answer questions you have about some topic of your own choosing. You will acquire the data, design your visualizations, run statistical analysis, and communicate the results.
A rough timeline of your project might look something like this:
Week of Nov 23-27: Data collection and cleaning finished
Week of Nov 30- Dec 4: Exploratory analysis finished, some modeling/visualizations, start website
Week of Dec 7-10: Modeling and/or prediction finished, website/screencast/(maybe) visualizations left
Of course each project will have very different timelines, this is a rough guide to help you plan ahead.
The Hard Deadlines for your projects are:
December 10: Final project due (11:59 pm)
December 15: Presentations shown in class, location/exact time to be updated.
Any changes that you make to your github repositories and webpages after the due date will be ignored. Please have all your work submitted and tested (websites, screencasts, etc.) before the deadline.
You will work closely with other classmates in a 3-4 person project team. You can come up with your own teams and use Piazza to find prospective team members. If you can’t find a partner we will team you up randomly. We recognize that individual schedules, different time zones, preferences, and other constraints might limit your ability to work in a team. If this the case, ask us for permission to work alone. In general, we do not anticipate that the grades for each group member will be different. However, we reserve the right to assign different grades to each group member based on peer assessments (see below).
There are a few milestones for your final project. For due dates see the course schedule. It is critical to note that no extensions will be given for any of the project due dates for any reason. Late days may not be used. Projects submitted after the final due date will not be graded. If you anticipate any issues (e.g., due to business travel) you need to send an email to the staff mailing list at least one week in advance.There are several deliverables for your project that will be graded individually to make up your final project score:
You start your project by forming your groups and letting us know what topic you are interested in exploring by submitting a project proposal form. Each team will only need to submit one form. Based on your proposals we will assign a TF to your team who will guide you through the rest of the project.
IPython Process Book
An important part of your project is your iPython process book. Your process book details your steps in developing your solution, including how you collected the data, alternative solutions you tried, describing statistical methods you used, and the insights you got. Equally important to your final results is how you got there! Your process book is the place you describe and document the space of possibilities you explored at each step of your project. We strongly advise you to include many visualizations in your process book.Your process book should include the following topics. Depending on your project type the amount of discussion you devote to each of them will vary:
Overview and Motivation: Provide an overview of the project goals and the motivation for it. Consider that this will be read by people who did not see your project proposal.
Related Work: Anything that inspired you, such as a paper, a web site, or something we discussed in class.
Initial Questions: What questions are you trying to answer? How did these questions evolve over the course of the project? What new questions did you consider in the course of your analysis? - Data: Source, scraping method, cleanup, storage, etc.
Exploratory Data Analysis: What visualizations did you use to look at your data in different ways? What are the different statistical methods you considered? Justify the decisions you made, and show any major changes to your ideas. How did you reach these conclusions?
Final Analysis: What did you learn about the data? How did you answer the questions? How can you justify your answers?
Presentation: Present your final results in a compelling and engaging way using text, visualizations, images, and videos on your project web site.
Describe the storytelling elements and goals in your process notebook and show us sketches and screenshots of different web site iterations. As this will be your only chance to describe your project in detail make sure that your process book is a standalone document that fully describes your process and results.
Project Review Meeting
You will schedule a project review meeting with your TF during the assigned review week marked in the schedule. Make sure all of your team members are present at the meeting. Distance education students can schedule an online meeting with their TF.
We expect you to write high-quality and readable Python code in your process book. You should strive for doing things the right way and think about aspects such as reusability, error handling, etc. We also expect you to document your code.
You will create a public website for your project using Google Sites or Github Pages or any other web hosting service of your choice. The web site should effectively summarize the main results of your project and tell a story. Consider your audience (the site is public) and keep the level of discussion at the appropriate level. Your iPython process book and data should be linked to the web site as well, either using a zip file, github, bitbucket, or another code hosting site. Also embed your main visualizations and your screencast in your website.
Each team will create a two minute screencast with narration showing a demo of your iPython process book and/or some slides. Information about how to prepare these screencasts can be found here. Please make sure that the sound quality of your video is good. It may be worthwhile to borrow or invest in an external USB microphone. Upload the video to an online video-platform such as YouTube or Vimeo and embed it into your project web page. We will show the best videos in class. We will strictly enforce the two minute time limit for the video, so please make sure you are not running longer. Use principles of good storytelling and presentations to get your key points across. Focus the majority of your screencast on your main contributions rather than on technical details. What do you feel is the best part of your project? What insights did you gain? What is the single most important thing you would like your audience to take away? Make sure it is upfront and center rather than at the end.
It is important to provide positive feedback to people who truly worked hard for the good of the team and to also make suggestions to those you perceived not to be working as effectively on team tasks. We ask you to provide an honest assessment of the contributions of the members of your team, including yourself. The feedback you provide should reflect your judgment of each team member’s:
Preparation: were they prepared during team meetings?
Contribution: did they contribute productively to the team discussion and work?
Respect for others’ ideas: did they encourage others to contribute their ideas?
Flexibility: were they flexible when disagreements occurred?
Your teammate’s assessment of your contributions and the accuracy of your self-assessment will be considered as part of your overall project score.
Submission will be handled through github. All teams must use a single shared github repository. If we cannot access your work because these directions are not followed correctly, we will not grade your work. You will need to specify your project github URL in the project proposal form. Store the following in your github repository:
iPython Notebook - Your project process book.
Data - Include all the data that you used in your project. If the data is too large for github store it on a cloud storage provider, such as Dropbox or Yousendit.
README - The README file must give an overview of what you are handing in: your project notebook, any non-standard Python libraries you used, and so on. The README must contain URLs to your project websites and screencast videos.
Project Scope - Did you choose the appropriate complexity and level of difficulty of your project?
Process Book - Did you follow the data science process and is it well documented in your process book?
Solution - Is your analysis effective and correct in answering your intended questions?
Implementation - What is the quality of your code? Is it appropriately polished, robust, and reliable?
Presentation - Are your web site and screencast clear, engaging, and effective?
Peer Evaluations - Your individual project score will also be influenced by your peer evaluations.