The project is an important grading item of the course (30% of the grade). It will allow you to choose a dataset and a question of interest, run analyses, and communicate your results. Project Milestone P1 is submitted by filling a Google Form, while project Milestones P2 and P3 are submitted by having a GitHub repository with the required deliverables at the date of the deadline. The repositories will be automatically collected.

Schedule

The schedule for the projects is as follows:

  • Milestone P1, due 23:59 CET, 4 Oct 2024 (10% of the project): To be done individually, where each student submits an outline of project ideas of up to 500 words by filling a Google Form. We will grade the creativity and clarity of the proposed ideas.
  • Milestone P2, due 23:59 CET, 15 Nov 2024 (20% of the project): To be done as a team, where the team submits a GitHub repository that includes: (1) a well-organized README containing the detailed project proposal (up to 1000 words) and (2) code containing initial analyses and data handling pipelines. We will grade the correctness, quality of code, and quality of textual descriptions.
  • Milestone P3, due 23:59 CET, 20 Dec 2024 (70% of the project): To be done as a team, where the team submits a data story using a platform of their choice, and the project GitHub repository containing your final code. We will grade the overall datastory and the associated code for correctness and quality, and quality of textual descriptions.

The bulk of your work should be over before Christmas, in order for you to focus on the exam (and exams of other classes). Note: Additional details about each project milestone are available below.

P1: First glimpse at the data

For Milestone P1, the first task for each team will be to select a dataset. We provide a variety of datasets that you can choose from. After selecting a dataset, each team member will individually perform the following tasks:

  1. Read the paper(s) relevant to the chosen dataset. Please see Column G of the dataset Google sheet. If you don’t fully grasp the technical details of the proposed methods, that’s totally fine. What matters is that you understand what the dataset is and how it was derived.
  2. Familiarize yourself with the chosen dataset. The best way to do this is by playing around with it, for example, by extracting summary statistics and going through different small samples of the dataset. Note that there is no need to load and perform an in-depth analysis of the entire dataset for Milestone P1.
  3. Once you have explored the dataset, propose exactly three bold and creative ideas for proposals of projects that could be done with your chosen dataset. At this stage, it does not matter whether the ideas are easily feasible or not, but you should still consider the data you would (potentially) need to realize the proposed ideas. Also, the ideas proposed in Milestone P1 may not necessarily turn out to be the project you will eventually do. The idea of this first milestone is to get the juices flowing, get you in a creative mode, and, at the same time, get your hands dirty! For each idea, it is important to clearly state: (1) the overall goal (title) of the project (2) high-level research questions (3) high-level steps for solution for each research question (no need for precise method implementations).

P1 deliverable: An outline of project ideas of up to 500 words (done individually). The outline of project ideas is submitted by filling a Google Form. We will grade the creativity and clarity of the proposed ideas. Note that for this first milestone we are not going to grade any code.

P2 & P3 - more information soon!