You are currently browsing Blog.

Blog

A Billion eBooks

In regards to the “billion ebooks” discussion going on at Humanist, I have that nothing to add that hasn’t already been said by Stephen Ramsay.

screenshot1

The wrong Wikipedia argument

“Scepticism about Wikipedia’s basic viability made some sense back in 2001; there was no way to predict, even with the first rush of articles, that the rate of creation and the average quality would both remain high, but today those objections have taken on the flavor of the apocryphal farmer beholding his first giraffe and exclaiming, ‘Ain’t no such animal!’ Wikipedia’s utility for millions of users has been settled; the interesting questions are elsewhere.”
- Clay Shirky, Here Comes Everybody, p.117

In my work on crowdsourcing, my advisors warn me to be careful of how I speak about Wikipedia around academics, because scholars are still divided on it. Clay Shirky’s quote perfectly encapsulates the situation: if it is clear that it works and that it works well, the question shouldn’t be “does it work?” Rather, we should be asking why it works. Kevin Kelly suggests that Wikipedia is “impossible in theory, but possible in practice“: shouldn’t we be tweaking our theories then? Perhaps then, the issue is that if an expert were to praise Wikipedia as reliable, they undermine society’s need for experts. Larry Sanger, creator/co-founder or Wikipedia, says no, but it’s certainly food for thought.
»

Numbers

Last week I wrote about the idea of trying to model the self by collecting a series of self-revelations and trying to organize them in a way where they may reveal insides that one had not previously considered.

This American Life had a whole episode earlier this year on trying to quantify things that should not be quantified. Quite appropriately, the have a series of stories of people who’ve tried experiments like I suggested and the lessons learned. Read the synopsis and listen to the episode at This American Life – Numbers. Like with every episode of the show, it’s highly recommended.

Artistic visualisation

This isn’t as much as direct response to Rockwell and Bradley’s Printing in Sand as a reaction to it. As I read through their embrace of scientific visualisation, a thought that I’ve been tossing around came to mind again. If visualising data should be concise, precise and easy, can is there any place for the abstract? That is to say, the artistic, the random, and the unfamiliar? Last year, Wordle struck a chord with the masses, despite it not providing much meaning beyond a pretty word count. Perhaps there’s a place in our hearts for the puzzle graph, where we don’t know immediately what’s going on, but we like to savour the time of figuring it out.

Why does iTunes Genius suck so much?

I’ve recently been mulling over the question, “Why does Genius suck so much?” and the implications that it has.

Genius is the playlist generation tool in Apple’s iTunes music software. You choose a song that you’re in the mood for, and it creates an entire playlist of similar songs. Essentially, its a recommender system; if you like x you’ll like y. The problem is that you get a very narrow point of view, with very little genre skipping. and no pleasantly clever surprises.

What sets Genius apart from other song recommender systems is that its essentially powered by the crowds. Apple has the luxury of a rich data set of habits and rating, and it appears to factor heavily into the recommendations. Indeed, algorithmic playlist generators were creating better results years before Genius came on the scene. So, what does this mean for the crowd?

The fact that computers can be better than humans in understanding art is off-putting. I’m still working through this problem, but here are some thoughts toward untangling it.

Ratings data is emotionless. When you rate a song 1 or 5, you’re giving it a universal ‘like’/'dislike’. This data doesn’t factor the mood of the song or the emotion of the listener. This is all very removed for circumstance. As I suggested to Bill Turkel, perhaps such simple crowd-based recommendations are better for high-level suggestions, like artists you may like, but useless at the micro-level (unless that data crowds are contributing is more specific to the topic of recommendations). In contrast, technology can quite effective interpreting the types and patterns of sound which represent an emotion. Certainly it can’t easily understand whether a song is good, but if you want a slow, jazzy rock song, that’s fairly achievable. This is something in which music recommendation is fairly unique, as it is easy to interpret than it would be to interpret thousands of movie plots or millions of book themes.

Despite this, perhaps the most-cited example of a good music recommender is Pandora, which is an internet radio based on the Music Genome Project (MGP). The MGP does use humans to categorize songs, having professionals tag each song with over 400 tags and using an algorithm to weigh the values. Pandora’s success shows that humans are indeed effective at understanding music, given that they’re looking at it in the right way.

There’s also the effect of popular media that makes human-based recommendations unbalanced. If a lot of people like Coldplay, the range of music that it will be recommended for will be broad. This additionally creates an echo loop where popular music simply grows in popularity. Inversely, it is very difficult for new music to enter the loop. If everybody that likes The Strokes like Yeah Yeah Yeahs, the recommender will reinforce this, brushing aside any similar new bands.

However, such problems are limited to the balance of the algorithm. Last.fm, which tracks all of its users’ listened music, is fairly effective in recommending similar music. Also, because of their detailed information on what a user has listened to, they can suggest less listened to songs. Though they don’t offer playlist generation, I wouldn’t put this beyond their abilities.

So where do crowds factor in here? If anything, Pandora suggests that this is best left to professionals. Certainly, you can’t get that sort of exhaustivity with crowds. The answer may lie in reliability. Large groups would be able to make much simpler connections, but on a larger and more verified scale. When I make a playlist with Lou Reed’s Take a Walk on the Wild Side, I always follow it with Urge Overkill’s Girl, You’ll Be A Women Soon.  The songs are linked very little, but there’s something in me that recognizes the similarly cool feeling that I feel. If you could somehow capture millions of these sorts of links, that could lead somewhere.

(Cross-posted to Crowdstorming. Leave any comments there.)

Returning to DIY

After my post on underestimating the ubiquity of data, Jeff Biggar asked me to expand on my predication that prediction that “business practices and marketing will take a back seat to quality and value to society.”

You can see my response there, related to a paper that I wrote last year, but today I’d like to relate this to Willard McCarty again, once again from the narrowed scope of only computing. This is partially for posterity, as his notes greatly overlap with mine, and I’d like to return to them if I ever find myself polishing my paper.

McCarty notes that we’ve seen the “gradual transfer of ability to construct artifact from highly specialized technicians to ordinary users, and the simultaneously increasing technical sophistication of these users”, or DIY computing. This has happened mainly due to three reasons: the regaining of computing unity through networking, the development of operating systems so as to free users from higher-level tasks, and an amateurization in the nature of software (notably the introduction of lower-level programming languages).

These three points provide a premise for the trend of increasingly content-driven computing. When more people are able to create, more are likely to do so when there is a necessary artifact. However, McCarty’s point on operating systems is important as a generalized rule: freedom from higher-level tasks. Rather than many people reworking the same problem, why not standardize the solution and let them worry about other things? The operating system takes you partway there, software libraries and modules take you further. A JavaScript library such as JQuery, for example, lets web developers stop worrying about JavaScript compatibility between browsers by offering it’s own functions, which it then translates properly into JavaScript based on the quirks of whichever browser it’s running in. Ruby on Rails is another web technology that builds on sensible defaults to allow users to skip higher level concerns like full links between their modularized code, full functions for common tasks, or complex server interaction. Consider that Twitter was originally built on Ruby on Rails. Twitter was a very novel concept and – as those who’ve tried it can attest to – is hard to understand in strictly abstract terms. However, Ruby on Rails allowed the creators to create Twitter as a side-project, with time away from their day job, and experience the new concept.

Externally looking inwards

Though class has moved on, I’m still digesting the early chapters of Willard McCarty’s Humanities Computing.

One thought that I posted during my Day of DH blogging is the idea of trying to model oneself. What if you started writing down every self-reflective thought that you have and real-life character example, and subsequently organized them into some sort of logic? Would such a systematic process allow you to derive understanding that you haven’t explicated, by virtue of it seeming “wrong” without it? Would such a removed process help you reach a more concise understanding of your quirks and your motivations?

Computational modelling and the Netflix Prize

In Humanities Computing, Willard McCarty notes that “computational form, which accepts only that which can be told with programmatic explicitness and precission, is thus radically inadequate for representing the full range of knowledge – hence useful for isolating what gets lost when we try to specify the unspecifiable.” (25).  In other words, there are certain ways of knowing that we cannot explain, but because computers can only accept concise directions, they allow us to understand what’s missing when we do try to model these ways of knowing. To attempt to explicate something human through a series of instructions, you can compare the result to what you feel is the the result, and adapt the instructions as necessary. Thus, as Willard McCarty did in his research on personification in Ovid’s Metamorphoses, the process of modelling becomes an iterative process of comparing, identifying, and changing. However, the trick to improvement is that  any changes affect all examples, and thus a change to accommodate a misnomer must also not break the model’s tolerance for something already accounted for. Or, if it does break it, perhaps it had not been explained by the model in the first place.

Such a process of modelling is apparent in what’s perhaps the most well known datamining project: the Netflix Prize. The Netflix Prize is a $1 million prize being offered by Netflix to the team that can can improve their recommendation system algorithm by a baseline of 10%. The contest has been running since October 2006 and teams are in the home stretch. Eleven teams are at 9% or higher, with the top two teams at 9.64% and 9.63%. However, progress has slowed to a crawl, as the teams push the limits of how much a computer can understand the intricacies of human preference.

Throughout the contest, teams have been remarkably open about their strategy, “acting more like academics huddled over a knotty problem than entrepreneurs jostling for a $1 million payday” (Wired – This Psychologist Might Outsmart the Math Brains Competing for the Netflix Prize). Thus, we see the effects of the iterative process of modelling: one team has an “a-ha” moment, notes the idea to the community and suddenly, everyone else has the same eureka moment.

What I find most fascinating about the prize is that there is a limit to what can be done. It apparently took only a month, out of the last two and a half years, for the leaders to get halfway there. Yet, now everyone’s poring over those misnomers, and can’t quite figure out why people like the most polarizing of films. The New York Times Magazine refers to this as the “Napoleon Dynamite” problem, after one of the worst of the misnomers. Other ones include “I Heart Huckabees,” “Lost in Translation,” “Fahrenheit 9/11,” “The Life Aquatic With Steve Zissou,” “Kill Bill: Volume 1” and “Sideways”.

History in academia

On Humanist, Willard McCarty recently wrote an eloquent response to the question, “Why is it that you are looking to the past when you search for answers concerning the future?”, and it’s gotten people talking.

Now the question of history and precedent is an interesting one. I very much believe in founding our current knowledge on what we’ve learned from the past. At my old school, I became very outspoken about the fact that, by fourth-year seminars, my fellow communications students still had a lack of historical understanding in their thoughts, resulting in shallowness and alarmism. For example, we’d heard the exact same arguments about email and Facebook that were raised in the face of the telegraph and telephone and haven’t stood the test of time. To be premised in the present inevitably leads to a problematic and erroneous understanding of the world. This is something that I’m sure most of academia would agree with. Yet, I feel that we do not practice it.

The overwhelming feeling that I’ve be had for years is that parts of the academic system are stuck on repeat. Tradition has impacted heavily on us, and we find ourselves continuing decades-old practises and discourses that have not affected the world in any discernible way. We believe so strongly in history, but yet we ignore when something has shown to be, in the ugliest of terms, useless to society.

I’ve repressed this opinion for a long time, until a recent chat with Kathleen got me thinking about it again. It’s the very reason for my choice to study in this field: I feel like the Digital Humanities, in it’s unfolding state, is an area where I can make a forward-moving difference. How appropriate, then, was the timing of McCarty’s post. He surprised me by addressing this directly and, taking it a step further, did so within the context of Digital Humanities.

Take text-analysis, for example. As a whole text-analysis isn’t terribly successful or satisfying, as many others in the field keep saying, and have said year after year since the early 1960s. Indeed, the postgraduate course in text-analysis that I teach is based on the question of why it is we (firmly in the present, with eyes fixed on the then present moment) run unto a metaphorical brick wall so soon after getting started; or less metaphorically, how we can get beyond the level of the individual word and individual words nearby, lemmatized or otherwise, to whatever it is that could be considered “context”; or, more philosophically, how we can possibly justify what we consider “context” to mean in any given textual situation. …

So the literary critic or textual editor, focused on interpretation of texts, doesn’t find him- or herself in a particularly good situation with respect to computing. Yet at the same time, let us say, he or she has this nagging feeling that the computer really could be useful, somehow. And, let us say, this critic, firmly in the present moment, has ideas about what went wrong and might be done about it. Isn’t it important at such a moment to know what’s been tried already? Isn’t it equally or more important to be able to extrapolate from the trajectory that text-analysis, say, has taken all these years to where now it makes sense to go?

Sure, McCarty does not directly address my concern of historical-ambivalance, but he what he does suggest with understanding is that there will emerge people with feelings like mine, not finding what they have “terribly successful or satisfying”, and extrapolating from history how to evolve past the unsuccessful models. I guess the very fact of this discourse is evidence in support of Willard McCarty’s point.

A glimpse into digital culture

Everybody should drop what they’re doing and head over to Thru-You. It’s a wonderful remix project where the creator takes various samples form youtube —a cello player here, a guitarist there, perhaps a Capella little ditty— and makes them into songs. There is no way to overemphasize the wonder in these videos/songs.

I love these glimpses into the real world people on Youtube. It reminds me of Mick Bianci’s phenomenal Youtubers video from over two years ago. The original video has since been removed, but luckily I found the one below.