Topic Modeling Workshop

On March 2, 2012, David Mimno from Princeton University offered a hands-on workshop on the topic modeling software MALLET. MALLET is a toolkit for analyzing collections of text documents. In this workshop we focused on one popular method, statistical topic modeling. We began with a description of how to represent text documents as data, but the bulk of the workshop was a hands-on demonstration of how to use topic models to find patterns in a text corpus. We covered four phases of modeling: preparing data and selecting a vocabulary, running models, analyzing results in the context of additional variables like time and document tags, and diagnosing problems in model fit.