The tradition of text analysis

Humanities Computing seems to be an oxymoron, the two words at war from opposite ends of the spectrum. After all, to compute is to calculate, but that with which the humanities concerns is inherently abstract. In humanities, you  don’t mathematical calculate; you analyze, you interpret and you understand. It’s the very exploration of the ‘humanness’ of humans.

Given that, it’s a bit trickier seeing how computers can benefit humanities research. They certainly can’t comprehend the splendor of a Shakespearean text, and critique a piece of Renaissance art. However, for what they cannot do themselves, there are a number of ways that they can assist us in doing it. They can scale projects from the few to the many, like with Wikipedia. They can assist our workflows, with everything from data processors to annotation tools. And, they can process data and spit it out in new ways, offering us new opportunities to analyze it.

The latter is what text analysis does. A computer can digest the entire works of Jane Austen and find patterns that would be too difficult for most humans. However, in seeing these patterns, a human can realize something new about the way that Jane Austen wrote, gaining a better understanding of her work.

Last year, I did a study researching American media coverage of strikes in France. At some point, I decided to run the articles that I was analyzing through TAPoR’s text analysis tools. When you run “List Words”, you get sparklines (which are little, inline, bar graphs) of the top couple of words. I soon began to notice that words like “president” and “government” consistently were at the beginning of articles, while words like “union” began to rise near the end. This very strong trend allowed me to form a hypothesis that guided me in my work.

Note that amongst the capabilities of computers, such analysis, based on presenting a text in new ways, is the first step in technological capabilities. When computers were relatively primitive, math based analysis was the extend of their abilities. This, I’ve come to believe over the past few months, is why Humanities Computing has developed such a strong tradition of text analysis: it’s simply where the field is rooted in. It’s tradition. Today, with much more possibilities for computing to benefit the humanities, the field is not as strongly defined. The term ‘Digital Humanities’ is picking up steam, because HuCo is no longer simply about calculations, and can no longer be described by a single verb.


Comments