Building Topic Models through Selective Document Exclusion

Earlier this month, I attended the ASIS&T 2011 Annual Meeting where our paper was selected for theĀ Best Paper Award.

In Building Topic Models in a Federated Digital Library through Selective Document Exclusion, we presented a way to improve the coherence of algorithmically derived topical models.

The work stems from topic modeling we were doing, first with PLSA and later LDA, on our IMLS DCC research group. The system we are working with brings together cultural heritage content from over a thousand institutions and, as a result contains quite diverse and often problematic metadata. This noise presents problems for inferring strongly coherent topic models, so Miles came up with the successful idea of identifying and removing topically weak documents from topic training. The paper outlines how this was done and the outcomes.

I encourage you to look through the full paper, which is fairly accessible, or the press release.

One thought on “Building Topic Models through Selective Document Exclusion

  1. Pingback: peter organisciak » Blog Archive » Academic Experience