My choice in dataset is "movie data" compiled into a spreadsheet provided by the class I'm participating at the University of Utah. You can find ratings and revenue from this dataset at RottenTomatoes and IMDB. The initial question I'm wondering is if there is a pattern in a genre rising to the top in some point of time and quickly dropping almost instantly to a non-popular genre within a year or two.
Exploring the data confirmed that this question could be answered easily with specific filters. Digging in, one of the first things I started noticing when looking at the data is that there is a field called "Number of Records", comparing those to the titles shows results that there are more than one record for some movies after sorting the data from largest to smallest.
This is a conflict since when comparing other data to the dimension for measurement, we will have inaccurate data. I was able to filter out the inaccurate data. As I started exploring other data, I started seeing NULL information. I decided to exclude that in the Genre's along with the year to keep interest in the data understandable. Looking at the other data gave me good confirmation that my filter was good enough to explore the data.
The first thing I decided to do is explore the gross income per genre in a given year. I have split the data into Worldwide Gross and US Gross to compare the difference. The data became super simple to compare to one another. The lines of the World Gross was pretty similar to the patterns of the US Gross which shows good correlation. The line was steady for Western and Musical Films along with some others so I filtered those away since they didn't answer my initial question. The thing is that the data may be relevant in a future question so I hid the data rather than filtered it out as excluded.
This simplified my data to the following.
In the image above the data is grouped into genre's on top of one another and time moves forward on the x-axis. The orange lines display the world-wide gross which map to values on the right-side axis. The US gross is the blue lines and maps to the values on the left. This wouldn't be the graph I would choose if I was to compare numbers but since I'm looking at slope changes to answer my question of a huge pitfall, this answers my question very well.
As you notice, the slopes are pretty consistent Worldwide and in the US so we can almost eliminate the option of a genre being popular in the US but not popular at all outside of the US.
Now there is some weird information that shows the time line going pass 2013. This is impossible since 2035 has not existed yet. Looking at another bar graph of time, the noisy values seem to be before 1950 and after 2011 so I excluded those results out of my data. The values were still too much to look at so I excluded the data that was pretty low since money was not as big in the pass. I filtered the data to 1996 and further giving me this result.
This is much nicer to read since we can focus on the slopes much better for details.
To answer my own question, I'm looking for the area of the graph that looks most like a mountain since I also wanted a rapid increase before a rapid decrease showing a spike in the chart. Looking at the figure above, I concluded that in different years, there were dramatic increase then decrease in popularity of genre. These are Thriller/Suspense in 1997, Drama in 2000, Comedy in 2006, Adventure in 2004, and Action in 2003.
This was surprising since I didn't think many genre's would have this kind of effect. The next question of curiosity is which genre out of these results had the most significant rise and drop. Looking at the graph, it really comes down to Action or Thriller/Suspense so we need to compare them close to one another.
A factor to add in when looking at these two since they look very identical is the amount of money that they also made on these movies. I was able to get my graph down to the following area chart.
Looking at this area graph, you can see that Action definitely made a lot more money. Looking at the beginning and end points we can see that Thriller/Suspense came back to where it was. Action actually dropped a lot more making 1.4 billion dollars and dropping to 1.15 billion dollars in 2004. That is a large impact since Thriller/Suspense went up and came down to about 160 million dollars. I would have to conclude that action took the greater fall since they lost more money when they dropped than when they started to increase.
Overall, Tableau is super fun. I never knew of all the functionality that a program that visualizes large scale data that would allow you to understand the data quickly and efficiently. My favorite thing is changing between graphs and saving my spreadsheets to come back when I don't get anywhere with some other data formatting. I'm excited to use Tableau in my research and studies throughout the semester.




No comments:
Post a Comment