Exploring the World of Cinema through Data: An ADAnalysis

Introduction & Research Questions

The quest for creating a captivating and memorable movie has captivated filmmakers and audiences alike for decades. The question of what makes a good movie remains a perpetual mystery, with numerous factors contributing to the overall cinematic experience. From the director’s vision and the harmonious blend of music to the chosen genre, the elements at play are diverse and interconnected. In this analysis, however, we embark on a focused exploration, focusing on a particular aspect of the cinematic universe: the actors. Looking into the three main characters of a movie allows us to conduct a different analysis, perhaps closer to reality than examining all the characters. Take “James Bond,” for example, while there are plenty of side characters they tend to stay in the background. Analyzing all characters might give us features that differ from the true spirit of the movie. Focusing on the main characters is like peeling back the layers to reveal what the movie’s all about.

As we dive into the vast realm of film ratings, our inquiry takes shape around a pivotal question: Is there a pattern that distinguishes main actors in well-rated movies from those in poorly-rated ones? In simpler terms, do viewers exhibit a preference for certain types of main actors, and can we uncover a correlation between an actor’s presence and the overall success of a film?

To answer this question we will be looking at the following points:

Building our Dataset

While raw data is already gathered, it’s not yet ready for analysis due to potential imperfections, complexities, and missing values. The clarity and accuracy of our findings directly depend on the quality of our data - a classic case of “garbage in, garbage out”.

Data Sources

We used the Movie Summaries dataset provided by the course alongside the extensive IMDb dataset, which offers supplementary information on movies and actors. A pivotal element of our study was the Academy Awards Database. This comprehensive record details past Academy Award winners and nominees from 1927 to 2023.

Understanding our data

In our data preparation process, we streamlined the dataset for optimal analysis. The first step involved standardizing date formats to years only, fitting our analytical requirements. We then proceeded to remove unused features and titles from the IMDb database that were not classified as movies. Duplication was another area we addressed, ensuring the uniqueness of our dataset.