Rethinking the History of German Literature 1731-1864

Principal Investigator: Matt Erlin, Washington University in St. Louis

This project employed techniques of probabilistic topic modeling to test a set of longstanding assumptions about the periodization of German literary history. Scholars have applied a fairly consistent set of period designations to categorize German literature written during the span of roughly one hundred years between 1750 and 1850: “Enlightenment and Sensibility,” “Storm and Stress,” “Weimar Classicism,” “Romanticism,” “Biedermeier,” “Young Germany,” and “Realism.” Applying the MALLET topic modeling toolkit to a data set of 154 novels written between 1731 and 1864, the project sought to evaluate whether these novels do in fact cluster together in ways that supports the scholarly consensus, or whether there might be hidden thematic structures in these works that point to new ways of thinking about their “proximity” to one another.

The Enlightenment Novel

The Enlightenment Novel project sought to rethink how this literary period is defined through text mining and other forms of "distant reading." We compared literary texts with publication dates ranging from 1750-1850 with a collection of philosophical Enlightenment texts via topic modeling (both with and without the philosophical texts present in the corpus) in order to identify Enlightenment topics and the literary texts that have a high participation in those topics. We calculated the Euclidean "distance" between the texts and then used Gephi to create a network diagram of works and their connections to the identified Enlightenment topics with those calculations. The various topics that were deemed to be "Enlightenment" topics concerned different aspects of Enlightenment thinking.

Literary Lists

The Literary Lists project investigated the composition of lists in literature from the eighteenth to the early twentieth century. Using both and English language and German language corpus, we were able to pull the lists, the text they came from, the author, and the number of items in the list (as well as the complete sentences in which the lists appear) into a csv file. From there, we were able to calculate an average list length for each individual text as well as each author represented in the corpus. Additionally, we are interested in the content of the lists, and have developed a system of categorization for the lists. Of particular interest are the human attributes, which may reveal certain Enlightenment ideals present in the texts.

Natural Language Processing

To support this project, the HDW is developing a number of new capabilities:

Natural Language Processing, including tokenization, part of speech tagging, stemming and lemmatization.
The management of large-scale databases. The latest version of this database holds > 37 million tokens.
The use of standard linguistic taxonomies. We've started using Wordnet, mostly as a way of getting used to handling large taxonomies, and of evaluating the usefulness of taxonomy driven methods against smaller sets of documents in English
We recently acquired a copy of Germanet, an equivalent of Wordnet in German, and have just started examining it.

Rethinking the History of German Literature 1731-1864