Blog

Returning to DIY

After my post on underestimating the ubiquity of data, Jeff Biggar asked me to expand on my predication that prediction that “business practices and marketing will take a back seat to quality and value to society.”

You can see my response there, related to a paper that I wrote last year, but today I’d like to relate this to Willard McCarty again, once again from the narrowed scope of only computing. This is partially for posterity, as his notes greatly overlap with mine, and I’d like to return to them if I ever find myself polishing my paper.

McCarty notes that we’ve seen the “gradual transfer of ability to construct artifact from highly specialized technicians to ordinary users, and the simultaneously increasing technical sophistication of these users”, or DIY computing. This has happened mainly due to three reasons: the regaining of computing unity through networking, the development of operating systems so as to free users from higher-level tasks, and an amateurization in the nature of software (notably the introduction of lower-level programming languages).

These three points provide a premise for the trend of increasingly content-driven computing. When more people are able to create, more are likely to do so when there is a necessary artifact. However, McCarty’s point on operating systems is important as a generalized rule: freedom from higher-level tasks. Rather than many people reworking the same problem, why not standardize the solution and let them worry about other things? The operating system takes you partway there, software libraries and modules take you further. A JavaScript library such as JQuery, for example, lets web developers stop worrying about JavaScript compatibility between browsers by offering it’s own functions, which it then translates properly into JavaScript based on the quirks of whichever browser it’s running in. Ruby on Rails is another web technology that builds on sensible defaults to allow users to skip higher level concerns like full links between their modularized code, full functions for common tasks, or complex server interaction. Consider that Twitter was originally built on Ruby on Rails. Twitter was a very novel concept and – as those who’ve tried it can attest to – is hard to understand in strictly abstract terms. However, Ruby on Rails allowed the creators to create Twitter as a side-project, with time away from their day job, and experience the new concept.

Externally looking inwards

Though class has moved on, I’m still digesting the early chapters of Willard McCarty’s Humanities Computing.

One thought that I posted during my Day of DH blogging is the idea of trying to model oneself. What if you started writing down every self-reflective thought that you have and real-life character example, and subsequently organized them into some sort of logic? Would such a systematic process allow you to derive understanding that you haven’t explicated, by virtue of it seeming “wrong” without it? Would such a removed process help you reach a more concise understanding of your quirks and your motivations?

Computational modelling and the Netflix Prize

In Humanities Computing, Willard McCarty notes that “computational form, which accepts only that which can be told with programmatic explicitness and precission, is thus radically inadequate for representing the full range of knowledge – hence useful for isolating what gets lost when we try to specify the unspecifiable.” (25).  In other words, there are certain ways of knowing that we cannot explain, but because computers can only accept concise directions, they allow us to understand what’s missing when we do try to model these ways of knowing. To attempt to explicate something human through a series of instructions, you can compare the result to what you feel is the the result, and adapt the instructions as necessary. Thus, as Willard McCarty did in his research on personification in Ovid’s Metamorphoses, the process of modelling becomes an iterative process of comparing, identifying, and changing. However, the trick to improvement is that  any changes affect all examples, and thus a change to accommodate a misnomer must also not break the model’s tolerance for something already accounted for. Or, if it does break it, perhaps it had not been explained by the model in the first place.

Such a process of modelling is apparent in what’s perhaps the most well known datamining project: the Netflix Prize. The Netflix Prize is a $1 million prize being offered by Netflix to the team that can can improve their recommendation system algorithm by a baseline of 10%. The contest has been running since October 2006 and teams are in the home stretch. Eleven teams are at 9% or higher, with the top two teams at 9.64% and 9.63%. However, progress has slowed to a crawl, as the teams push the limits of how much a computer can understand the intricacies of human preference.

Throughout the contest, teams have been remarkably open about their strategy, “acting more like academics huddled over a knotty problem than entrepreneurs jostling for a $1 million payday” (Wired – This Psychologist Might Outsmart the Math Brains Competing for the Netflix Prize). Thus, we see the effects of the iterative process of modelling: one team has an “a-ha” moment, notes the idea to the community and suddenly, everyone else has the same eureka moment.

What I find most fascinating about the prize is that there is a limit to what can be done. It apparently took only a month, out of the last two and a half years, for the leaders to get halfway there. Yet, now everyone’s poring over those misnomers, and can’t quite figure out why people like the most polarizing of films. The New York Times Magazine refers to this as the “Napoleon Dynamite” problem, after one of the worst of the misnomers. Other ones include “I Heart Huckabees,” “Lost in Translation,” “Fahrenheit 9/11,” “The Life Aquatic With Steve Zissou,” “Kill Bill: Volume 1” and “Sideways”.

History in academia

On Humanist, Willard McCarty recently wrote an eloquent response to the question, “Why is it that you are looking to the past when you search for answers concerning the future?”, and it’s gotten people talking.

Now the question of history and precedent is an interesting one. I very much believe in founding our current knowledge on what we’ve learned from the past. At my old school, I became very outspoken about the fact that, by fourth-year seminars, my fellow communications students still had a lack of historical understanding in their thoughts, resulting in shallowness and alarmism. For example, we’d heard the exact same arguments about email and Facebook that were raised in the face of the telegraph and telephone and haven’t stood the test of time. To be premised in the present inevitably leads to a problematic and erroneous understanding of the world. This is something that I’m sure most of academia would agree with. Yet, I feel that we do not practice it.

The overwhelming feeling that I’ve be had for years is that parts of the academic system are stuck on repeat. Tradition has impacted heavily on us, and we find ourselves continuing decades-old practises and discourses that have not affected the world in any discernible way. We believe so strongly in history, but yet we ignore when something has shown to be, in the ugliest of terms, useless to society.

I’ve repressed this opinion for a long time, until a recent chat with Kathleen got me thinking about it again. It’s the very reason for my choice to study in this field: I feel like the Digital Humanities, in it’s unfolding state, is an area where I can make a forward-moving difference. How appropriate, then, was the timing of McCarty’s post. He surprised me by addressing this directly and, taking it a step further, did so within the context of Digital Humanities.

Take text-analysis, for example. As a whole text-analysis isn’t terribly successful or satisfying, as many others in the field keep saying, and have said year after year since the early 1960s. Indeed, the postgraduate course in text-analysis that I teach is based on the question of why it is we (firmly in the present, with eyes fixed on the then present moment) run unto a metaphorical brick wall so soon after getting started; or less metaphorically, how we can get beyond the level of the individual word and individual words nearby, lemmatized or otherwise, to whatever it is that could be considered “context”; or, more philosophically, how we can possibly justify what we consider “context” to mean in any given textual situation. …

So the literary critic or textual editor, focused on interpretation of texts, doesn’t find him- or herself in a particularly good situation with respect to computing. Yet at the same time, let us say, he or she has this nagging feeling that the computer really could be useful, somehow. And, let us say, this critic, firmly in the present moment, has ideas about what went wrong and might be done about it. Isn’t it important at such a moment to know what’s been tried already? Isn’t it equally or more important to be able to extrapolate from the trajectory that text-analysis, say, has taken all these years to where now it makes sense to go?

Sure, McCarty does not directly address my concern of historical-ambivalance, but he what he does suggest with understanding is that there will emerge people with feelings like mine, not finding what they have “terribly successful or satisfying”, and extrapolating from history how to evolve past the unsuccessful models. I guess the very fact of this discourse is evidence in support of Willard McCarty’s point.

A glimpse into digital culture

Everybody should drop what they’re doing and head over to Thru-You. It’s a wonderful remix project where the creator takes various samples form youtube —a cello player here, a guitarist there, perhaps a Capella little ditty— and makes them into songs. There is no way to overemphasize the wonder in these videos/songs.

I love these glimpses into the real world people on Youtube. It reminds me of Mick Bianci’s phenomenal Youtubers video from over two years ago. The original video has since been removed, but luckily I found the one below.

Underestimating the ubiquity of data

Via FlowingData, I came across “Hal Varian on how the Web challenges managers” from the McKinsey Quarterly.

Varian, Google’s Chief Economist, speaks on a wide variety of issues, but all of them centre around the ubiquity of computing and free information. We are in a time of “combinatorial innovation”, where there’s an abundance of raw components, and innovation lies in using what is already available in the right combinations. In other words, we are standing at the start of a period of potential: we have what we need to innovate and now need to play around with it. Such periods revolve around a specific innovation (electronics in the 20s, integrated circuits in the 70s), and this time around, the fulcrum is the Internet.

This is similar to the point I suggested in a paper last term, where I argued that the ubiquity of tools positions us at the beginning of an “age of innovation” (borrowing the term from Felix Janszen). As more people become comfortable with computing and as tools for software innovation become more accessible, we have been and are going to continue seeing an acceleration in the realization of good ideas. Business practices and marketing, I predict, will take a back seat to quality and value to society. This is why the most successful online companies, such as Facebook, Twitter and Google, concentrate on the product first, and the revenue stream later. I have seen this baffle tradition business-types (and of course journalists), but a quality product is the only way a company can ensure that a better service created in some kid’s basement bedroom won’t pull the rug out from under you (as Facebook, Twitter, and Google have all done themselves).

SSHRC scholarships to focus on Business

Jeff Biggar just sent me a link to a “Petition in Support of the SSHRC.” by NDP member Niki Ashton. I was surprised to find out the following:

For more than thirty years, the Social Sciences and Humanities Research Council (SSHRC) has been promoting and supporting university-based research and training in the humanities and social sciences. SSHRC funding has been used to complete ground breaking research in countless areas in Canada and around the world.

The Federal Budget presented on January 27th contains a sentence that has the potential to halt this kind of research: “Scholarships granted by the Social Sciences and Humanities Research Council will be focused on business-related degrees”.

These measures are backward and insulting to the thousands of Canadians that are students and researchers in the social sciences and humanities.

What?! Has anybody else heard about this? While, yes, the petition’s wording of “insulting” can apply here, it’s more thansomething personal. I’m not upset because I’m in the humanities, but because this goes directly against the beliefs and values that have brought me here. After the debacle of a free market, greed-driven culture that has collectively dug our societies a hole, you would think that we would be moving towards a softer, more humanistic approach to the society of tomorrow. SSHRC scholarships reward the brightest (hi Kathleen!), and this line in the 2009 Federal budget shows that, rather than re-evaluating the vehicle, the Canadian government is looking for smart people to get extra mileage from the broken-down jalopy. It’s the capitalist take on the old communism vs. stalinism defense: “It’s not that a democracy run on cutthroat greed doesn’t work, it’s that it hasn’t be done right yet”.

Visualising nodal information

In In Praise of Pattern, Stephen Ramsay makes much the same point that I made last week, that one of the most effective ways that computers can benefit qualitative, non-binary research is by breaking down texts (in the broader definition of the word) and presenting them in a way that a human can understand them in a way not possible before.

When looking for inspiration for visualisations, Ramsay went fishing, finding that nature naturally forms into its own graphs in many places, if care enough to pay attention. In this spirit of breaking down visual communication, I’d like to go through a thought experiement on visualising a basic piece of information, represented as a node. This is based on some old sketches that I found in my journal, and the node idea is influenced by the Mandala browser.

node-types-node

Okay, so say we have our information organized in nodes. Really, it could just as easily be a screenshot or box of information, but for now, I’ll start with a simple point. Like on a graph, it’s free in it’s own space; that is to say, the node exists in a larger, two- (or more) dimensional plane, unlike the one dimension that textual information usually follows.

node-types-identifiers

Now, if have multiple nodes that need to be differentiated, we can easily show this visually through shape or fill (color, shade, texture). Say you’re plotting nodes of hit pop songs from the past fifty years. You could quickly identifier their most important characteristics by the look of the node. The gender of the lead singer could be visualized with shape (female=circle, male=square, and none=triangle). Each decade could be assigned a different color, so songs from the sixties could be blue, or songs from the eighties could be pink. Length of songs could be quickly shown by the size of the the node (i.e. longer songs could have larger nodes).

node-types-relationships

Once you have a number of nodes, you can show relationships between them. Since they exist in a multidimensional plane, distance is probably the most aparent way of showing relationship between items. Other ways include branching, which shows a flow, and orbit, which can use distance (from center) to show relevance to the main node, but also show relationships between the satellite nodes (by virtue of how close they exist in orbit).

node-types-longitudinal-analysis

The last form of visualising that I considered is longitudinal analysis, or showing relationships and changes over time. Traditionally, nodes are graphed with time as the independent variable. However, in computing, animation is also a useful and effective way to show temporal change. I’ve found an increasing number of visual communication relies on animation not for novelty, but to emphasize a point. This 2008 Democratic primary breakdown is a prime example (for example, click between “Whites” and “Blacks” and note how the animation brings home the point). Animation can even be combined with graphing, if there’s a different dependant-independant relationship that you hope to show. Google’s motion charts are an example of this.

These concepts just scratch the surface. If you have any of your own ideas, feel free to share them below.

Facebook – don’t speak lest someone hear you

Yesterday, I attended PD Day 2009 from the School of Library and Information Studies (SLIS). There Andrew Keenan, a HuCo/SLIS student, presented on “Evaluating Sociability Online”, in which he tried to identify the distinguishing features of two popular social networks — Facebook and Myspace — and two niche social networks — Twitter and LinkedIn. In his conclusion, he found that each represented a different paradigm in its success: Facebook as private (closed connections based on real-world identities), Myspace as public (persona building and extravagant), Twitter as technology (doing one thing extremely well), and LinkedIn as community (gathering around a commonality, akin to message board communities). The separation helps explain how there’s roomm for each of these, though Keenan argues that community-based websites will decline in preference of Facebook, while technology-based one-use websites will explode in popular. I agree with the latter sentiment, especially since such “do one thing well” sites are perfectly fit for pluging into Facebook Platform or OpenSocial.

Keenan’s presentation was extremely refreshing, and I’ve identified why: he understood what he was talking about. I read about Facebook much more than I use it, and it seems that, when on the topic of the service, the tin-foil hats come out and commentators lose all sense of reason. Often, these are commentators from established media that simply don’t consider the big picture view of social networking in human communication. Having spent many months teaching journalists,  I can attest to having witnessing this firsthand. Having, I have seen such losss of reason as apparently in school, with young people similarly having a reaction to such recent communications upheaval.

It should be noted that there were people in the audience with their apparently-rehearsed speeches, shocked and ready to attack Keenan on his evaluation of Facebook. “Facebook isn’t private!” was met with many ‘yeas’. “Even if you have a fake name and delete your account, they’ll still have you photos” was another odd comment. Even the keynote speaker pitched it, mentioning Facebook is a business that can do whatever it wants with your data and that it violates Canadian privacy laws because the data that you given it is hosted on servers that are possibly in America. Suddenly, the fresh air that was Keenan’s presentation dissipated into a hot air. What more, within the context of the presentation, the point was right on:  in privacy studies on social networks, Facebook has come out on top with the flexibility and strength of its privacy features. As Keenan noted in a response: privacy problems on Facebook are generally a user-issue, not a systems-issue. This is well addressed by James Grimmelmann in Facebook and the Soocial Dynamics of Privacy:

The first task of technology law is always to understand how people actually use the technology. Consider the phenomenon called “ghost riding the whip.” The Facebook page of the Ghost Riding the Whip Association links to a video of two young men who jump out of a moving car and dance around on it as it rolls on, now driverless. If this sounds horribly dangerous, that’s because it is. At least two people have been killed ghost-riding1, and the best-known of the hundreds of ghost-riding videos posted online shows a ghost rider being run over by his own car.

Policymakers could respond to such obviously risky behavior in two ways. One way—the wrong way—would treat ghost riders as passive victims. Surely, sane people would never voluntarily dance around on the hood of a moving car. There must be something wrong with the car that induces them to ghost ride on it. Maybe cars should come with a “NEVER EXIT A MOVING CAR” sticker on the driver-side window. If drivers ignore the stickers, maybe any car with doors and windows that open should be declared unreasonably dangerous. And so on. The problem with this entire way of thinking is that it sees only the car, and not the driver who lets go of the wheel. Cars don’t ghost ride the whip; people ghost ride the whip.

Over a hundred million people have uploaded personally sensitive information to Facebook, and many of them have been badly burnt as a result. Jobs have been lost, reputations smeared, embarrassing secrets broadcast to the world.

It’s temptingly easy to pin the blame for these problems entirely on Facebook. Easy—but wrong. Facebook isn’t a privacy carjacker, forcing its victims into compromising situations. It’s a carmaker, offering its users a flexible, valuable, socially compelling tool. Its users are the ones ghost riding the privacy whip, dancing around on the roof as they expose their personal information to the world.

Keenan’s presentation slides are available at the PD Day 2009 website. What do you think? I haven’t yet addressed any of the nay-sayers’ issues, but if you’d like to hear some on that, feel free to start a debate.

The tradition of text analysis

Humanities Computing seems to be an oxymoron, the two words at war from opposite ends of the spectrum. After all, to compute is to calculate, but that with which the humanities concerns is inherently abstract. In humanities, you  don’t mathematical calculate; you analyze, you interpret and you understand. It’s the very exploration of the ‘humanness’ of humans.

Given that, it’s a bit trickier seeing how computers can benefit humanities research. They certainly can’t comprehend the splendor of a Shakespearean text, and critique a piece of Renaissance art. However, for what they cannot do themselves, there are a number of ways that they can assist us in doing it. They can scale projects from the few to the many, like with Wikipedia. They can assist our workflows, with everything from data processors to annotation tools. And, they can process data and spit it out in new ways, offering us new opportunities to analyze it.

The latter is what text analysis does. A computer can digest the entire works of Jane Austen and find patterns that would be too difficult for most humans. However, in seeing these patterns, a human can realize something new about the way that Jane Austen wrote, gaining a better understanding of her work.

Last year, I did a study researching American media coverage of strikes in France. At some point, I decided to run the articles that I was analyzing through TAPoR’s text analysis tools. When you run “List Words”, you get sparklines (which are little, inline, bar graphs) of the top couple of words. I soon began to notice that words like “president” and “government” consistently were at the beginning of articles, while words like “union” began to rise near the end. This very strong trend allowed me to form a hypothesis that guided me in my work.

Note that amongst the capabilities of computers, such analysis, based on presenting a text in new ways, is the first step in technological capabilities. When computers were relatively primitive, math based analysis was the extend of their abilities. This, I’ve come to believe over the past few months, is why Humanities Computing has developed such a strong tradition of text analysis: it’s simply where the field is rooted in. It’s tradition. Today, with much more possibilities for computing to benefit the humanities, the field is not as strongly defined. The term ‘Digital Humanities’ is picking up steam, because HuCo is no longer simply about calculations, and can no longer be described by a single verb.