Home » Articles posted by Thomas Cleary

Author Archives: Thomas Cleary

Gathering and Processing Documents

One of the earlier problems faced when gathering the documents was how to collect them all. American Quarterly articles are made accessible in two different databases, JSTOR holding 1949 to 2013 (everything in the publication except the last 6 years) and Project Muse holding articles from 1996 to the present. Instead of attempting to individually download each article and deal with the different file naming conventions, I used Zotero as a management platform. This allowed for exporting the documents in a uniform way and creating a csv listing the documents.

When planning on collecting the documents, JSTOR’s Data for Research service was looked at too but was not used for a few reasons. First was that it included everything in every journal, not just the articles but also book reviews, front matter, back matter etc. All of this would have been too much to sort through and clean up. Also since JSTOR only holds the publication up until 2013, the data services they provided did not give the most recent scope of the journal.

After collecting the documents as pdfs, they needed to be cleaned up and converted into plain text files that can be used for text analysis in Mallet and AntConc. Using Adobe Acrobat Pro, the cover sheets of each document was removed and then exported as “Text (Accessible)” files. It was not possible to export the documents gathered from JSTOR as true plain text files, so the accessible version with minimal formatting was the best option

What data to gather?

When first approaching the project a decision needed to be made about what would make up the data of the project. While it was clear that I would be using Keywords and topic modeling to explore the documents, what are the documents that will be explored?

Since it’s inception in 1949, the American Quarterly changed it’s format a few times, but regularly published a combination of articles and book/event reviews. Over the years the reviews were presented in many different ways, sometimes showcasing publications based on themes, other times taking individual books or events and reviewing them in the context of American Studies, and some issues did not have any reviews at all. This presented the problem of do I add these reviews in with the hope that they add more insight into when topics were being talked about? Or have the fluctuation in the number of reviews (knowing how topic modeling is based on modeling individual articles) over or under represent certain topics

Since the intention of why the journal had reviews changed so much over the years and with the added outside opinion from my capstone mentor that the reviews were not good indicators of what was actively happening in the field (but might be interesting to look at in the future), I decided to not include them in the data. This left me with just the core articles of original research that made up the journals.