Skip to the content.

the influence of history on the movie industry


Literature is a mirror of history. Or at least some forms of literature can be in a way interpreted as memories of the past. Authors write about challenges, problems and fates of individuals and society, which sometimes unintentionally reflects their personal perspective of the times they’re living in, even though no evident connection to a specific historical moment needs to exist.

Our question is: Can such a statement be adapted to other media formats, or more specific movies, as well? Surely there exist a lot of movies, which purely serve the purpose to entertain. But, as in literature, there could be some degree of reflection of contemporary events and developments hidden in the plots or characteristics of a movie, a kind of a reflection of history.

In this data story, we are exploring exactly this: We analyse the impact of history on the types of movies that have been produced. The dataset CMU Movie Summary Corpus serves as a source of movie metadata and plot summaries. We determine through different tests whether specific events and developments in history have had an influence on various parameters of the movie industry, like the plots of movies or the types of movies that were produced.


The analysis conducted here is combining historical events with the CMU Movie Summary Corpus dataset to figure out how history impacts the movie industry. For this, we are choosing a set of historical events and developments, which have had a major impact on the world during the same time, for which we have data in the set. These events are all unique and their possible influences on the movie industry can be derived from their distinct characteristics. We test the data on whether a change in these characteristics of the movie industry is measurable inside a reasonable time frame after the event. By testing several independent events we can detect a pattern on if and how history overall is reflected in movies.

abrupt changes vs. slow developments

History is written by different kinds of changes, which we categorise into two groups: abrupt changes and slow developments. Abrupt changes are single-time events, which have a precise date and change the world immediately. An example of an abrupt change is the end of World War II. Slow developments are not defined by one single event only, but they are characterised by a series of events, which all influence the development of one common topic. The changes provoked by these events are usually not as impactful as abrupt changes, but are being developed slowly over the series of events. Therefore we need a different approach to measure their impact on the movie industry. An example of a slow development is gender neutrality. We include in the analysis the distinction between these two types of historical events and try to determine if one of them has a greater influence on the movie industry than the other.

choosing important events from history…

In order to measure the impact of history on the movies industry we choose a number of abrupt change events and slow development topics. The selection was inspired by this article, where the by the author rated as “most important” historical events of each year are listed. From there, we chose the following events:

These events all belong to our definition of abrupt changes. In addition, they represent historical developments of different categories: The end of WWII and 9/11 are events related to war, the Moon Landing and the machine topping a chess champion are scientific breakthroughs, the energy crisis and the 1980s recession are economical and the hurricane Katrina is a natural catastrophe. This diversity is interesting for our analysis, as we might get more or less influence from different categories.

For the slow developments, the following topics were chosen:

research questions

After all these definitions, it is time to specify our research questions:

  1. Is there a measurable influence of history on the movie industry? How strong is this influence?
  2. Can we say that detected changes in movies are influenced by history, or are these changes the natural development of the movie industry anyways?
  3. Are abrupt changes or slow developments more influencing the movie industry?
  4. Are there categories of history subjects that are influencing more?

data preparations

After having familiarised ourselves with the content of the CMU Movie Summary Corpus, the last preparation step was performing data scraping. We don’t bother you with details about this process here, if you’re interested in them please have a look at the github page of our team.


For the analysis of the category of events of abrupt changes, we performed a semantic analysis on the plot summaries, in order to determine the frequency of the words that have been identified as correlating to the subject that the event represents. For World War II for example, we identified all the words of the plot summaries, which are related to the subject war, within a certain time (the buffer time) before and after the events (the frequency). Then we performed a regression analysis to identify whether the frequency of these words in the movie summaries could be predicted with the information of the event. We performed this analysis for three different buffer times, which are 5, 10 and 15 years.

world war ii (buffer times: 5, 10 and 15 years)

WWII_freq_5 WWII_freq_10 WWII_freq_15

moon landing (buffer time: 10 years)


energy crisis (buffer time: 10 years)


machine tops chess champ (buffer time: 10 years)


9/11 (buffer time: 10 years)


katrina (buffer times: 5, 10 and 15 years)

katrina_5 katrina_10 katrina_15

In the following table the results (p-values and coefficients for the prediction variable “theme of the event”) of the regression analysis are presented.

Table 1: Results of the regression analysis of abrupt changes.

event p-value prediction coefficient
World War II (5 years) 0.008 -0.0162
World War II (10 years) 0.002 -0.0114
World War II (15 years) 0.014 -0.0067
Moon Landing (5 years) 0.116 -0.0025
Moon Landing (10 years) 0.016 -0.0024
Moon Landing (15 years) 0.038 -0.0019
Energy Crisis (5 years) 0.440 -0.0004
Energy Crisis (10 years) 0.456 -0.0002
Energy Crisis (15 years) 0.604 -0.0001
Chess Machine (5 years) 0.037 -0.0005
Chess Machine (10 years) 0.002 -0.0005
Chess Machine (15 years) 0.002 -0.0004
9/11 (5 years) 0.001 -0.0023
9/11 (10 years) 0.000 -0.0032
9/11 (15 years) 0.000 -0.0039
Katrina (5 years) 0.002 -0.0009
Katrina (10 years) 0.000 -0.0009
Katrina (15 years) 0.000 -0.0007

The values of the tables have to be interpreted in the following way: The p-values tell us about the statistical significance of the predictor variable. If it is below 0.05, there exists a significant correlation between the prediction and the variable. In that case, the event has had an impact on the movie industry. The prediction coefficient tells us how important this impact was, and in which direction it shifts the prediction: a negative value means, that the prediction value is going down due to the prediction variable.

We can thus see that five out of the six events analysed are of significance. This lets us conclude that history does indeed have an impact on the movie industry. However, the prediction coefficients being negative, we see that the events were followed by a decrease of the popularity of the topic of this event, which is counterintuitive to us.

The highest significance (meaning the lowest p-value) is usually found for a buffer time of 10 years. This tells us that the influence of historical event on the movies industry happens within a time frame of about 10 years.

For the analysis of the category slow developments we again had a look at the frequencies of the words in the plot summaries that are correlated with the subject of the theme. This time there is no buffer time to be defined, but instead we look at the different time spans between the events that we chose. That way we receive an image of the overall impact of all events that are part of the same slow development.

gender neutrality: frequency


racism: frequency


In addition for gender neutrality, we looked at the metadata of the movies to deepen the analysis. We identified the trends for the ratio of actresses to actors and the age gap there is between actresses and actors.

gender neutrality: ratio (actors per actress)


gender neutrality: gender gap


Again, the results of the regression analysis (p-values and coefficients for the prediction variable “theme of the event”) are listed in the table below.

Table 2: Results of the regression analysis of slow developments.

event p-value prediction coefficient
gender neutrality (frequency) 0.184 0.0073
gender neutrality (ratio) 0.000 -6.8793
gender neutrality (age gap) 0.000 -29.7284
racism (frequency) 0.050 -0.0006

The frequency analysis of slow development events does not tell us much, as the p-values let us not conclude anything (p-value > 0.5). However, the ratio of actors per actress and age gap are highly significant and also have very high prediction coefficients. They should be negative, because the gender equality is reached (at least what these predictors can tell) when the two curves for ratio and age gap reach the value zero.

This lets us conclude that the movie industry is making progress in terms of being aware of gender neutrality and working towards it. In the case of the ratio of actors to actresses and age gap, this lets us conclude that historical events do have an impact on that development.

A final analysis for gender neutrality was made with the help of the character identification of roles played by actresses. Although very little data is available for this, it is possible to identify a trend. In the pictures below, the character role is printed bigger, if it was played more frequently. As we can see, there has been a diversification of characters that actresses are playing. The most frequently played characters can however still be connected to rather patriarchal views of the role of women in society. This shows again the fact that there is progress happening concerning gender equality in the movie industry, but it does not let us conclude anything about the impact of history on it. roles

In addition to the analysis of abrupt changes presented above, we looked at the different genres of movies and identified whether there has been a change of their percentual distribution within 10 years before and after the event. For the event World War II there have been interesting findings:

world war ii: genres

WWII_genres_before WWII_genres_during WWII_genres_after

It appears that there have been some drastic changes in the percentual distribution of genres over the time before and during versus after World War II. After the event, there has been a drastic increase in dramatic movies. This shows that, in this case, it might be possible that the historical development had an important impact on the movie industry.

As a third part of the analysis for abrupt changes, we identified the most significant words from the movies that correlated with the subject of the event and then analysed, again for 10 years before and after the event, how the presence of these words in the movie summaries changed. Here are the results:

world war ii: presence of topic


As we can see, in the case of WWII, there is a distinct increase of the topic during with respect to before and after the event. The interesting aspects of this observation are the quick response of the movie industry (we identified above that a response is usually generated within 10 years) and the decrease of the topic after the event. This decrease could be explained by the indecisiveness that occurred throughout all literature after WWII on how to process the horrors of war.

machine tops chess champ: presence of topic


9/11: presence of topic


katrina: presence of topic


In these plots, the message is again mixed: we have a decrease of the presence of the topics after the machine beating chess champ and 9/11, but an increase of the topic after hurricane katrina. Reminding ourselves of the negative prediction coefficients that the predictor theme of the event has, this result seems to be in order for the first two plots and themes. But for hurricane katrina, there seems to exist some major other influence that is stronger than the influence of history. At least it is stronger than the event hurricane katrina. It is not possible in the scope of this analysis to identify whether or not this other influence is connected to historical events or not.


Through the regression analysis we tried to identify the isolated influence of specific events on the movie industry. However, no overall tendencies have been identified. We discovered trends in some domains, but had to realise that they are contradicted through a deeper analysis. In conclusion we have to admit that our analysis has not revealed very much about the connection of historical events and the movie industry. We can say that there is a significant relation between them, but we do not (yet) understand this connection in detail. To deepen the understanding of the connection between history and the movie industry, the study would have to be conducted over a much larger number of events. With only eight events, no statistical significance can be reached.

Brought to you by ADAcADAbra, a group of students at EPFL.

If you’re interested in the exact data pipelines and methods we used to write this datastory, have a look at the github page of our team.

This project has been realized in the scope of the course Applied Data Analysis, which is held by Robert West at EPFL. Many thanks to the support of the whole ADA-team!