Acquiring and Preparing a Corpus of Texts (online RCR; GS717.07)

September 21, -
Date: Thursday, September 21, 2023
Time: 9:00am - 11:00am
Location: Online

Before you can undertake automated text analysis, it's necessary to obtain a corpus of digitized texts and, in many instances, take steps to prepare them for further processing. This digital humanities workshop focuses on the technical, logistical, and legal dimensions of corpus development. We will explore the risks and benefits of optical character recognition (OCR); file formatting and naming issues; organization strategies for large corpora; problems of data cleaning and preparation; common sources for textual research data; and legal and ethical concerns around the use of textual corpora.