To support Material Culture in the 19th Century German Novel, the HDW is developing a number of new capabilities:
- Natural Language Processing, including tokenization, part of speech tagging, stemming and lemmatization.
- The management of large-scale databases. The latest version of this database holds > 37 million tokens.
- The use of standard linguistic taxonomies. We've started using Wordnet, mostly as a way of getting used to handling large taxonomies, and of evaluating the usefulness of taxonomy driven methods against smaller sets of documents in English/
We recently acquired a copy of Germanet, an equivalent of Wordnet in German, and have just started examining it.