Primary Sources into Data

Considering the primary sources that we have encountered in class such as the death certificates, and the tidy dataset assignment that was also due this week concerning donations and donors, these sources can be turned into data that is easy to consume and organized in a way that can be beneficial to not just one project but many different kinds that might be looking at different aspects of a dataset of a primary source. For example, for out tidy dataset project concerning the Curator’s source, the data could be used in a project about different curators themselves, or maybe the donors and the types of items that were donated. Data could be used to conduct a project concerning the spelling and writing during that time. When a primary source is converted to data, that is hopefully tidy, it’s easier to see correlations between the different variables or certain patterns within the dataset. For example, when creating the tidy dataset for the curator’s records, it was interesting to see that all donors were men. Or that Lewis Nicola was a curator for many years and in the folio section of the dataset was a curator from 1781-1785. When a primary source is translated to data, having everything laid out in front of you in an organized fashion is less stressful. It’s easier to scan through the dates, or maybe find a specific name of a donor or donation.

Portion of my tidy dataset

For the examples in class that we have done, we input our primary source as data into an excel spreadsheet, but I think that’s one of the first steps in converting a source to data. There are other programs as well that can help visualize data. We can use this data or variables of said data to create graphs, maps, etc.

Wickham’s principles of tidy data are a way to standardize data and creating a link between the layout of data and its meaning. Wickham’s has three principles for tidy data: each variable forms a column, each observation forms a row, and each observational unit forms a table. A variable according to Wickham is “[containing] all values that measure the same underlying attribute (like height, temperature, duration) across units,” while an observation is “[containing] all values measured on the same unit (like a person, or a day, or a race) across attributes.” Like examples of tidy data shown in the journal article, was how I organized my data for our own constructing of tidy data with donors. And not only does following the principles for tidy data help keep data standardized and organized, but helps when further utilizing that data for analysis, as Wickham emphasizes is the point of creating tidy data: to streamline the process.

Wickham’s example of Billboard data tidied up


Wickham, Hadley. 2014. “Tidy Data”. Journal of Statistical Software 59 (10):1-23.

No responses yet

Leave a Reply

Your email address will not be published.