We are going to bring you on an amazing journey inside the Hollywood world. Enjoy the reading.
Drama story
Please, pretend to be an emerging movie director. You are young with no experience at all, but you studied hard at the renowned Action-Drama Academy, commonly known as ADA. You just graduated. That night you get drunk with your friends, you are celebrating your achievement. Your phone rings. An agent from Hollywood is calling you. You are too high to really understand what is happening, the only neuron that is left says: “It is probably just a fake call”. Therefore, you confidently pretend to be the best student in history.
You wake up, it is 2 P.M. and you have no recollection of these events. Suddenly, a message arrives: “Thank you for the nice talk. As we discussed yesterday, I will come back to you for your movie proposal next year. You dispose of a rather modest budget and you are free to make whatever choice you want. Good luck.”
Now that all of your neurons are awake you understand the big problem you have: you are asked to produce a CHEAP movie (otherwise it would have been too easy). You know that if you miss this chance, all of your hard work at the academy is going to be wasted. It is time for big responsibilities. Do you want to make a difference in the history of movie?
You have no idea where to start, there are so many different kinds of movies. Which one will bring you success? During your studies, someone told you that there is a consulting company that helps young directors to reach fame after graduation. You look through your stuff, the business card is still there:
You immediately send an email to Ms. Monani. You receive a fast answer, which is already a good sign. You schedule a meeting with the boss of the NoLemonNoMelon company in a few days.
The day has come, 2 minutes before the meeting time you click on the zoom link Lorance gave you. Of course, you haven’t updated zoom, so you connect late (otherwise it would have been too easy). During the discussion you and Ms. Monani make a deal: she is going to collect and analyse all the movie data she finds in order to provide you with insights into what kind of movie you should produce. In exchange you will pay her 5% of your revenue.
Ms. Monani and her amazing team work day and night to present the analysis as soon as possible.
After 2 weeks you receive an envelope with the results from the consulting company you hired.
Letter
Dear future movie director,
Me and my team worked full-time to provide you with this report. We hope you are going to appreciate the analysis we made.
We stay at your disposal for any additional questions.
Best regards,
Lorance Monani
Introduction
In this report you will find a detailed study of the relationship between movie rating, revenue, and budget throughout the last 60 years. We also analysed the genres in order to give you an idea about which movie type is more likely to obtain good rating and high revenue. Lastly, we were interested in low budget movies with high revenue and ratings. We investigated possible common characteristics of these movies and how they evolve with time.
In the conclusion part, you will find our suggestion for creating an amazing movie with a low budget.
Datasets
We gathered a huge collection of movies from the past century and the beginning of this century. A CMU Movie Summary Corpus served as our base dataset. To enable a complete overview of the movie distribution, we completed it using data from IMDB, TMDB, Wikipedia and corrected monetary values using inflation data. The next figure gives an idea of the number of movies in each year that are contained in the database used for the analysis.The number of movies produced in the twenty-first century has skyrocketed, exceeding 10,000 per year.
Rating, Revenue and Budget
The success of a movie is determined by 3 main parameters: its revenue, its budget and its rating. We first analysed the distribtions of these features and their relationship. As such data is scarce, we had to take a subsets of approximately 5'000 movies, which are distributed throughout the years as shown in the figure below:
The IMDB ratings we worked with are in the range [0,10] where 10 is the maximum and 0 is the minimum, with a mean of 6.5. The distributions of revenue and budget are centred around 100 million US dollars, with a heavy-tailed shape, meaning they are linearly distributed in terms of magnitude (you might see these in logarithm in some plots for this reason). For budget and revenue, it is important to correct the data with monetary inflation in order to be able to compare different years. Without the monetary correction, there would seem to exist a quite constant increase from 1959 to 2021 in these features, when there are actually none. As can be seen in the figure below, there is higher variability during the first years (1959-1973), which is probably due to the low number of movies during those years compared to more recent years.
In both cases (budget, revenue), we see that the median value is below the mean value. It is due to the presence of huge outliers in both variables (heavy-tailed). The shaded areas highlight the 95% confidence interval around these values, as will each error bar in this report.
The third variable, the rating, seems to have on average constantly decreased from the beginning of the timeline (1959) to now, suggesting that it is more difficult to get a high rating now than it was before. This trend might be explained by various factors. It could be that so many movies are available nowadays through various platforms that the public has become more demanding. Another explanation is that the grading community itself might have changed with the advent of internet. A known phenomenon in other areas suggest that bad experiences are more likely to result in a review, and it might have increased with the possibility of grading in a few clicks. Furthermore, some bad old movies may have been forgotten and not included in the databases at all. These are only speculations about the trend; what remains is that it might be harder today to reach a good grade than it was in the past.
We analysed how these three variables are correlated; the results are reported in the figure below. As previously said, we used the log of budget and revenue to mitigate the huge domain spammed by these parameters.
There is a high correlation between the log of budget and the log of revenue, and between the rating and the log of revenue. The surprising (and hopeful for you!) fact is that there is no correlation between the rating and the budget. Even with a low budget, you can imagine producing a film with a high rating, and your rating might (we cannot know at this point) drive revenue. However, low budget can be a disadvantage to reach a high revenue.
You lift your head from the paper, your eyes are tired. You feel less stress than before, there is still a chance to get a high rating movie from a low budget. The ratings are getting down with the time, so you realize that you should not be surprised if some people wont love your movie. But which movie type is best suited for this role? You keep reading.
Genre
As the choice of the genre is a determinant feature to produce a movie, we focus on this characteristic. Many movies are associated to multiple genres, which is illustrated by the co-occurence chord diagram below:
For each genre, the lines looping on themselves represent the appearance of the genre alone, while the lines linking to other genre represent their co-occurence. Everything is normalized by the total amount of movie genre associations. The button allows to increase the threshold of min co-occurrences to keep when building the diagram, allowing to visualize it at different resolutions.
Some information resides in genre associations, but most of it still resides in individual genres.
For the rest of the analysis, we counted each movie in each of its associated genres. In the next plot we give an idea of the different genres present in the data set. We remark that drama, documentary and comedy movies are extremely present compared to other categories such as thriller movies.
As we are interested in the rating, the budget and the revenue variables, we have explored each of them for each genre:
From these representations we remark that genres with highest rating are in order: biography, western and war. Documentaries have the lowest budget and revenue but are well graded. In addition, western and war movies generate high revenue and are well graded. On the other hand, the adventure category shows the highest revenue and budget but is not particularly well rated.
Another value which would be interesting to look at is what we can call rentability: the ratio between revenue and budget. In fact a high revenue alone does not tell us a lot of information, it is better to understand how many times the revenue is bigger that the budget. Below we have the figure displaying the rentability in log scale. Documentaries have the highest ratio, followed by horror films, while animation categories have the lowest.
The take home message from the above plots is that you should avoid animation. Instead, you should focus on documentaries (high rentability and well graded). To a lesser extend western movies are well rated and have medium rentability. Horror movies have high rentability but low ratings. Biography and war movies are not as rentable as documentaries although they are very well rated.
Until now, the analyses have covered the entire time period from 1959 to 2021. To better understand our period, we analyzed the genre's evolution over time. A general decrease in rating score was perceived in a first analysis. However, is it the case for all movie’s genre? The movies has been divided into two periods, with 2000 as the cut-off point, when an increase in the number of movies is observed. The next figure reports the results on rating, revenue and budget, before and after the cut-off.
Concerning the rating and the revenue, the trend is similar for all genres: the ratings are decreasing (except for action movies), and no movies show a significant increase in their revenue. Science fiction, adventure, and action movies show a significant increase in budget, whereas romance and drama movies undergo a significant decrease in their budget. Thus, a significant increase in budget does not reflect a significant increase in revenue. In the second period, it seems harder to create movies as good as in the past, which could be due to the increasing competition.
Clustering
Then, we explored how to separate the data to get meaningful results about how to produce a good movie without much money. First, we expanded our dataset with some new features that were not used before. After that, we manually created some subsets of the data according to the budget and our two measures of success of a movie which are : the average rating and revenue/budget. We wanted to come up with specific attributes of good and cheap movies.
Separation into groups according to budget and rating
First, we decided to separate the data into high/low budget and high/low rating categories by taking the 0.25 and 0.75 quantiles as cutoff. We looked at the genres reprensentation in the different subsets of the data :
The fraction represented in these graphics are over the total number of movies of a particular genre in the whole dataset (5’000).
We remarked that for :
- high rating - low budget: there are many documentaries, western, war movies.
- high rating - high budget: there are many biographies, history movies. There are also several adventure, science-fiction and war movies, but less than the other.
- low rating - high budget: there are many animation, science-fiction, fantasy, family and action movies.
- low rating - low budget subset there are many horror movies. There are also several documentaries, but less than the other.
We did some statistical tests on the high rating - low budget subset vs the rest of the data on the specific genres that came out. The documentary and war movies are significantly more represented in the low budget - high rating subsets than in the rest of the data. The trend is similar for western, but not significance.
We can do the same representation but with the production countries instead of the genres:
We saw that for :
- high rating - low budget: there are more movies from DE, DK, ES, HK, KR than in the other subsets. Almost 50% of the movies made in these countries are in this subset. There are also many FR, GB, JP and AU movies.
- high rating - high budget: there are many movies from NZ.
- low rating - high budget: there are many movies from CH.
- low rating - low budget: there are many movies from RU.
It is interesting to see that in none of the subsets, USA was the major production country. Indeed, the USA produced the majority of the movies in this data set, meaning USA movies have either an average budget or rating. It is more non usual production countries that appear in the subsets of low/high rating and budget. A reason could be that the data are taken from a US based collection, in which foreign movies must be good to be included.
Some statistical tests of the data that is high rating - low budget vs all the rest of the data were performed on the specific countries that came out. All the tested countries except JP and AU reached the level of significance.
In the subgroup high rating - low budget we looked also at the attributes of the cast members, such the height, the age and the fraction of men. We noticed no difference with the rest of the data. However, when we separated the movies into old and recent ones (cutoff date = 2000), we had some interesting results :
Separation into groups according to budget and revenue
Then, we decided to look at our other success measure of the movies which is : revenue/budget. We selected low budget and high revenue/budget movies. First, we looked at the genres representation within this subset :
A rapid overview on this subgroup highlights the strong presence of documentaries. There are significantly more documentary, horror and mystery movies in this subset than in the rest of the data. It is interesting to see that this is not exactly the same movie genres that came out from the high rating - low budget subset.
We looked at the production countries within this subset and we obtained that there are significantly more movies that are produced in DE, DK, ES and KR in the high revenue/budget - low budget subset than in the rest of the data :
By performing again a cast analysis, we compared this subset with the rest of the data. We could draw the same conclusions as for the comparison of the actor attributes of the high rating - low budget subset and the rest of the data. However, we observed significantly more women actors in recent high revenue/budget - low budget movies compare to the rest of the data. It was just a tendency in the high rating - low budget subset.
Conclusion
Here, you can find our final suggestion.
Well graded movies tend to generate a lot of money, but there is no evidence that a high budget drives a high rating. This result is extremely important for your case. Be aware that a modest revenue seems to prevent high revenue, but be hopeful, because good rentability is possible!
Concerning the genre of the movie, documentary, horror, mystery, and war movies are good opportunities. If you want to earn money and also have a good rating with a low budget, you should concentrate on documentaries only. If your goal is to earn money despite a low budget, documentary, horror, and mystery genres are the most cost-effective. Finally, if you want to have a good rating with a low budget, documentary and war movies are the best.
From this result you can see that documentaries are quite relevant for you. Documentaries have the lowest budget and revenue, but are very well graded, which defines it as a well-suited genre for the situation you are in.
Just for your interest, western movies generate high revenue and are well graded but require quite a high budget.
For the production countries, you should definitely concentrate on Germany, Danemark, Espagne, and South Korea only, as they combine all the advantages.
With the result presented in this report we suggest you to focus on documentaries, horror, and mystery genres for getting high revenue relative to your low budgets. To also reach high rating we encourage you to focus on documentaries. In this case, the revenue you can get is not extremely high but you can still easily earn 10 times the given budget. Another strong point for the documentary is that you are not required to select a cast because nature is outside just for you, ready to become a movie star.
You have just finished reading the report. You take a deep breath. You look outside the window and your eyes shine. The world is full of amazing creatures, views and diversity. You are going to produce a documentary that everyone is going to remember.