Midterm Project
Sources
For this project, I used the Tate Museum Artwork and the Tate Museum Artist datasets. The Tate collection can be found on GitHub with both datasets of the artists featured at the museums as well as the art pieces from the museums. My aim for the project was to combine both these datasets to see how frequently artists from each country and continent are featured with their work in their museums. All the data necessary for this project was included among both datasets which is why I needed to use the combination of the two.
Processes
The majority of my work for the project was spent cleaning and wrangling the data. Most of this was done using R Studio. The first part of this included combing the artwork and artist data into one data frame because for my visualization, I need information from both datasets. Since both data sets had artist names in them, I was able to join both data sets by artist name. After importing them into R and combining them, I cleaned up the data to remove any missing rows of data and removed any columns not needed in the visualization. The most important columns I needed were the year of the art pieces in the museum, the name of the artists, and the place of birth of the artists. The most tedious part of the data cleaning was making all the places of birth to be in a uniform manner. Since each row had a city, country, both, or some other region, it was hard to compare exactly where each artist was from. To make them uniform, I decided to make each location to be by country. Many of the countries were in their native languages while others weren’t so much of the work required manually changing them to their English name. This is so that each country matched when doing the calculations and because I wanted to group the countries by continent and R was able to easily identify the continents if the country names were in English. I used Flourish for the visualization of the data which for the charts required a wide dataset with each country as its own column and the count of artwork of the data to be the values. So, I made sure to make the dataset wide before exporting so that Flourish would be able to swiftly convert it into a visualization like the one you see below.
Presentation
This area chart shows the birthplace of the artists featured in the Tate Museums over time. The data is from 1750 onward because the museums have more artwork featured in the datasets in those years. Hovering over each colored area will allow you to see which country it represents as well as other data points like how many artworks were from artists born in that country in a given year, who were those artists, and what continent that country is in. The continent dropdown will allow you to narrow down the results to only include countries in that continent. The legend below the graph represents the colors of each country and allows you to click on countries to deselect or select them to be displayed on the graph.
Significance
I represented my project question in an area chart because I felt it was the best way to analyze the trend of artists’ birthplaces over time. While mapping or other types of graphs would be useful to see the total count for each country, they would not be able to respect the patterns over time which is what provides important context. For example, artists from the United Kingdom as drastically more represented in the museums than artists from any other country which is expected. However, what we get to see with this graph is how the majority of that is really in the early 19th century and after that, a lot more artists from other countries are represented in the museums. This chart also goes alongside historical contexts as well by having the years. This is seen by looking at the artists from Asia and how between 1860-1930, artists from India were the only ones to have art featured at the Tate Museums. This is because that timeframe aligns with Britain’s rule over India. Contexts like these are important as they demonstrate what stories are being told in the museums and what stories aren’t alongside why that is the case.