Home » News » Gathering and Processing Documents

Gathering and Processing Documents

One of the earlier problems faced when gathering the documents was how to collect them all. American Quarterly articles are made accessible in two different databases, JSTOR holding 1949 to 2013 (everything in the publication except the last 6 years) and Project Muse holding articles from 1996 to the present. Instead of attempting to individually download each article and deal with the different file naming conventions, I used Zotero as a management platform. This allowed for exporting the documents in a uniform way and creating a csv listing the documents.

When planning on collecting the documents, JSTOR’s Data for Research service was looked at too but was not used for a few reasons. First was that it included everything in every journal, not just the articles but also book reviews, front matter, back matter etc. All of this would have been too much to sort through and clean up. Also since JSTOR only holds the publication up until 2013, the data services they provided did not give the most recent scope of the journal.

After collecting the documents as pdfs, they needed to be cleaned up and converted into plain text files that can be used for text analysis in Mallet and AntConc. Using Adobe Acrobat Pro, the cover sheets of each document was removed and then exported as “Text (Accessible)” files. It was not possible to export the documents gathered from JSTOR as true plain text files, so the accessible version with minimal formatting was the best option

css.php