Peter Organisciak
PhD Student, Graduate School of Library and Information Science, University of Illinois
orga...@illinois.edu

What do I do?
My research interests lie at the intersection of online systems and users. How do users self-organize within the constraints of a system and how can systems adapt to these needs. This juncture between the humanistic look at users and the technical considerations of systems design characterizes most of my research, which includes sociological looks at online crowds, the communication of data through visualization, and online communication data mining.
Ph.D.
Advisor:
Miles Efron
Education:
MA, Humanities Computing – Library and Information Studies, University of Alberta, 2010
Honours BA, Communications and Multimedia, McMaster University, 2008
Selected Publications, Papers and Presentations
See my Curriculum Vitae!
Honors and Awards
Ian Lancashire Graduate Award (Best student paper, SDH-SEMI 2011)
Blog
Crowdsourcing Excerpts pt.2
September 22nd, 2008
A while ago, I suggested checking out Jeff Howe‘s book excerpts, and tried to summarize some of the best parts. As “the best parts” grew quite long, I had to cut the post, leaving later bits unpublished.
With the book out now, I’m digging those back up. Here’s the excerpt on Chapter 5, where things start getting interest, and some of my thoughts.
Chapter 5: The Rise and Fall of the Firm: Turning Community Into Commerce
Chapter 5 touches upon an oft understated quality of crowds: their natural connection to community. Crowds are rarely groups of disparate human being. Rather, they form around common connections in varying degree of community.
Howe explains to us that communities are changing: not for the worse, but toward the more efficient. A grievance we often hear about modern society is the erosion of neighbourhood communities. However, geographically-defined communities, in their pre-World War II hey-day, were popular because they were the most accessible common-interest groups (the common-interest being the location). As new tools became available, humans have found ways of being community members with broader groups, and bound by interests beyond geography. Thus, the slide of our culture’s individuals into new depths of isolation is not the case. Rather, our communities are simply moving into less visible ground. In a great observation that I had not considered, Howe notes that now, with the ease of social connection provided by digital tools, “new types of communities have materialized that are both local and wired at the same time”.
Chapter 5 also looks at the successful online efforts of the Cincinatti Enquirer, particularly through the CincyMoms community blogs. It is a good look at do-or-die changes in publishing. There’s also a gem of information I wanted to highlight lest you miss it. In regards to a reader-submission feature on the Enquirer’s website:
The words “GetPublished” feature prominently on every Enquirer Web page. The results land in Parker’s queue, and they almost never resemble anything commonly considered journalism. “It used to read, “Be a Citizen Journalist,” Parker says. “And no one ever clicked on it. Then we said, ‘Tell Us Your Story,’ and still nothing. For some reason, ‘GetPublished’ were the magic words.” The Enquirer considers the feature to be an unequivocal success.
Why I Haven’t Embraced The Business of Crowds
September 17th, 2008
For a person sorting through this blog, you may have noticed a pattern: I rarely write about services founded on crowdsourcing as a business model. I write about small experiments, or incidental crowdsourcing, but not on the myriad of crowdsourcing startups that have appeared since this blog began over a year ago.
There’s a reason for this: they rarely interest me.
There’s a time and a place for crowdsourcing, and what I love is when it is used to the achieve something that cannot otherwise be created. There’s also a soft spot for cleverness in concept. However, we increasingly see social for social’s sake.
Now, bad ideas are only a fraction of sites. Many others are simply not thought out in a way that the can be successful and, unfortunately for sites built on a foundation of crowds, success and usefulness are invariably linked.
I want crowdsourcing as business to suceed—I really do—but thus far it has been most successful by accident or by incident. THAT is where the story is: in understanding this phenomenon. Clearly we have the tools, but are still working on the trade.
ReCaptcha
August 7th, 2008
I was recently forwarded a link to ReCaptcha, and was stunned to realize that I have never written about it. Stunned because ReCaptcha was one of the main sparks of my interest in crowdsourcing.
ReCaptcha is a tool out of Carnagie Mellon, headed by Luis von Ahn (mentioned previously here). To understand reCaptcha, one needs to understand captchas. A captcha is a human verification tool that displays an image with a string of warped characters. The task is to write those characters into the input box. Because of the complexity of image recognition, this task more or less confirms that you are a human and not a bot. Of course, spammers can hire low-wage captcha crackers, but captchas nonetheless introduce an enormous hurdle to online spam and other automated cons.
ReCaptcha is an improvement on the original concept. Amongst other accessibility improvements, reCaptcha’s primary innovation is that it helps digitize old books. That right, digitize old books. Rather than offering randomly warped words, reCaptcha instead offers the user words from scanned books that the computer recognition is having trouble with. This assists in the various efforts to digitize (and in the process preserve and recover) libraries of aging books.
The brilliance of this cannot be understated. The tool takes an action that millions of people already need to do, and appropriates that manpower into something useful. Perhaps the best parallel is to solar energy. The sun is an energy source that is completely wasted in urban areas. It is everywhere, constantly beaming this energy onto the earth, and the cleverness of solar cells allow is for people to capture this potential (constant and wasted) and convert it to something greatly useful. Anybody who has ever been awed by solar energy can understand the exciting potential that reCaptcha represents in technology.
New Crowdsourcing Links List
August 6th, 2008
I’ve started a page listing crowdsourcing sites and initiative. Find it here.
Strategies for Harnessing Group Potential
July 13th, 2008
There are mainly two ways in which crowds are utilized in crowdsourced effects.
The first is what I’d call the “million monkeys” strategy. Quite simply, this is the appeal to the crowd for the one or few with a commodity —be it information or material– that you need. “With a group this large, I’m bound to find what I need!“ Greater size and diversity offers a bigger box to sift through, but ultimately it is a few individuals that matter, rather than the crowd itself.
The million monkey strategy is common online today. Skill auction sites, such as 99Designs, iStockPhoto, and Yahoo! Answers, reward the best provider of solution to a problem while others watch from the sidelines. Even though the Internet does not provide anything newly achievable, in that the right connection at the right time does not necessitate a gradiose group, the amount of minds online have greatly increased the potential of achieving that ideal connection.
Yet consider the example of 99 Designs, where people offer bounties for their design needs, and then choose the best submission as the winner of the challenge and recipient of the payment. A few hundred dollars for a job may seem common for an entry-level designer, but that seems much smaller when one considers the discarded man-hours of the unpaid submitters. Spec (‘speculative’) work is frowned upon by professionals because of its devaluative nature, and this concern can be seen to parallel much million-monkey crowdsourcing: it skins the cow for the leather but leaves the meat to rot.
More exciting is truly collaboratively crowdsourcing, because it represents possibilities for collective creation and problem-solving that have never been seen before. In the purest form, such crowdsourcing allows thousands to each contribute a small part towards a bigger picture (recall the metaphor of Ten Thousands Cents). Oftentimes, such crowdsourcing overcomes traditional organization dilemmas, such as costs and management. For example, to categorize images en masse, as is done both actively with the ESP Game and incidentally with Flickr tagging, could not possibly be done at any reasonable rate of return prior to the arrival of the Internet.
As modern communication technology encourages crowds to grow larger while more streamlined, what new problems will they come to solve? My communications education has pinned groups as collectively blunt, but now it seems that this is a result of primitive communication techniques; online, the “individual” is much more a part of the whole. If we continue to repurpose individual minds in new combinations, the results will be something not often characteristic of society: fresh.
As both the million monkey and truly collaborative approaches require the same source—a crowd— projects need not be bound to simply one approach, nor are they. For example, the ever-popular Threadless has a million monkeys system for t-shirt submissions, while an generally democratic system of voting (mixed with some managerial liberties). If there is a community with one goal it can be re-purposed for another. And thus we arrive at a topic for another day: the reworking of existing communities.
Crowdsourcing Excerpts pt.1
June 18th, 2008
Jeff Howe, the man that coined the term “crowdsourcing” in a June 2006 article, is in the editing stages of his new book, and wants your comments. Since April, he has been posting excerpts from Crowdsourcing: How the Power of the Crowd is Driving the Future of Business. The purpose is to solicit crowd insight and comments, the best of which may even make it into an appendix. However, at the same time, it allows us to sneak a peak inside into the book.
If you worried that Howe was simply falling into writing the book by obligation, worry not— the book, as seem through the excerpts, appears to be wonderfully written and insightful. It’s a highly recommended read, and I hope that the rest keeps up the same calibre.
Here’s links and summaries to the parts that have been thus far posted (Chapters 2-4, more to come later). Remember that there’s an open call to contribute insight, so if you have an issue to raise, let him have it.
Chapter Two: The Rise of the Amateur excerpt #1
More setting the pace than offering insight, this excerpt introduces us to iStockPhoto, the low-cost stock photo site run by amateurs. The site is profitable—making more more for parent Getty Images than any other property— but the secret’s not in the commerce. Rather, it ‘s the community. The site does not exploit naive photographers by underpaying them, but rather offers them a community for improving their skills, incidentally offering the promise of a bit of money. “iStock doesn’t offer a chance to get rich. It offers the chance to make friends and become a better photographer.”
Chapter Two: Rise of the Amateur excerpt #2
The second excerpt from Chapter 2 outlines the importance of amateurs in the early stages of the Scientific Revolution. However, industrialization led to increased specialization, and society began to breed experts over polymaths. The division of labour, which reached its pinnacle with Fordism, began to cross into academia, creating segmented areas of human thinking.
This excerpt provides the setting for story from Chapter 4 onwards, where we start to see the trend begin to reverse.
Chapter 3: From So Simple a Beginning excerpt #1
Howe introduces us to the remarkably capable open-source movement, and the equally important GNU/General Public License. From such a seemingly chaotic and unstructured model as open-source comes some truly remarkable software, oftentimes better than commercial alternatives. Anybody who has used a popular Linux distro will can relate to this fact.
In 1976, Bill Gates and Paul Allen wrote an “open letter to hobbyists.” It did not mince words: ‘As the majority of hobbyists must be aware, most of you steal your software.’ …The hobbyists needed professional programmers because, after all, ‘What hobbyist can put 3-man years into programming, finding all bugs, documenting his product and distribute for free?’ Gates could have never anticipated the answer to his question, which was that no single hobbyist could put 3-man years into such a daunting task, but 3,000 hobbyists easily could, and soon would.
Chapter 3: From So Simple a Beginning excerpt #2
Another great excerpt, here we look at the flaws of the patent system and are introduced to a concept that we will see in later sections: when the mixed masses do a job better than experts.
The patent system was broken. The debate now revolves around how to fix it. Over 90 percent of patent applications are successful, giving rise to a rat’s nest of vague, overlapping patents. “We wind up in these fights over patents where we can’t tell what they mean, and the courts can’t tell what they mean, and even the patentees can’t tell you what they mean,” Kappos says.
Chapter 4—Faster, Cheaper, Smarter, Easier: Democratizing the Means of Production excerpt #1
Chapter 4: Faster, Cheaper, Smarter, Easier excerpt #2
As technology has grown, the tools of experts have come into the reach of the amateur. Increasingly, the amateur is enabled to do things that had been previously guarded by experts. Along the way, economic casualties are had, be it stock photographers threatened by iStockPhoto or typesetters by home printers and desktop publishing software.
Chapter 4: Faster, Cheaper, Smarter, Easier excerpt #3
Hawthorne Heights isn’t signed to a major record label. They don’t have a giant marketing budget, nor a luxurious production and distribution model. Yet, owing greatly to the tools afforded to them through the internet, namely Myspace, their most recent album debuted at number 3.
This is where things get interesting. Why? Because the entire music business is being overhauled, and once again, the big guys are losing to the ones at the bottom of the pyramid. Though the excerpt does not delve into it, what is apparent is that the magic is in the long tail: the long tail of fans, of musicians, and record labels. The classic model was based on one big band, supported by many fans, being nurtured and enabled by the horizontal and vertical resources of the big record label. The big bang successes would usually make up for the big-budget fizzles.
In the past few years, there a been a remarkable change to this model. All the other bands, those not chosen to be made into ‘the next big thing’, have found themselves a voice on the Internet. They need neither the budget nor the promotional services which previously allowed major labels to control the market. “Most up-and-coming bands don’t regard illegal peer-to-peer file sharing as piracy; they view it as a promotional and distribution channel”. At the same time, the long tail of casual music fans have found music to be much more accessible as a hobby, and there’s something for everyone. Sales records are no longer being broken, but more bands are getting a cut, more concerts are being attended, and more music is being listened to. Yet, despite the exciting state of affairs, it comes as no surprise then that the RIAA is up in arms, when the tools that they’ve been using to get a chokehold on the business (promotion, production and distribution) are no longer necessary.
Earth Crowds Classifying Space Galaxies
May 5th, 2008
Bill Dunphy recently directed me to Galaxy Zoo, an astronomy project out of Oxford that’s flown under my radar. Most basically, Galaxy Zoo offers millions of sky charts to the public, asking them to classify galaxies. Like many “artificial artificial intelligence” tasks, this is something that is immense in scale but hard to computerize.
It’s interesting to note the seriousness with which the researchers approached Galaxy Zoo, in that the content was the primary purpose and rarely is the word “experiment” tossed around in regards to the method.
By all accounts, the study was a success. Here are some notes culled from the first two papers stemming from the project.
Statistics
As of November 28th 2007, GZ had over 36 million classifications (called ’votes’ herein) for 893,212 galaxies from 85,276 users. (Land et al. 2)
“… we are able to produce catalogues of galaxy morphology which agree with those compiled by professional astronomers to an accuracy of better than 10%.” (Lintott et al. 9).
User Reliability
For greater reliability, two methods were employed. First of all, each galaxy was classified numerous times, and the researchers would “only use objects where one class-type receives a significant majority of the votes.” This technique of independent confirmation is used in most such undertakings, as it limits individual impact by unreliable or malicious users.
There was also a weighted ranking, where “users who tended to agree with the majority have their votes up-weighted, while users who consistently disagreed with the majority have their votes down-weighted” (Land et al. 2). However, reserachers did not see much difference, and chose to concentrate on the unweighted results.
“More than 99.9% of the galaxies classified as MOSES ellipticals which are in the Galaxy Zoo clean sample are found to be ellipticals by Galaxy Zoo.” (Lintott et al. 2)
Bias and Validity
Numerous types of bias were recorded. Notably, colour-bias (where more experienced users can classify based on the prior tendencies of the specific colour) and spiral-bias were noted. The second one, as noted in Galaxy zoo finds people are screwed up, not the Universe, appears to people a product of human psychology, where users would tend to classify a galaxy as rotating counterclockwise, when in theory CW and CCW should be about equal.
To investigate this, the creators ran a bias sample, with occasional monochromatic, vertically-mirrored, and diagonally-mirrored images. We see this done in Luis von Ahn’s projects, with decoys being used in Recaptcha and ESP Game to help determine reliability. The GZ researchers note the Hawthorne Effect, in that “users may be more cautious with their classifications if they think that they are being tested for bias” (Lintott et al. 9). However, considering the example of Recaptcha—which offers one real word and one decoy—perhaps such an effect can be utilized fully.
Participation
To get as many users as possible, simplicity and a low barrier to entry were extremely important considerations in the design. “Visitors to the site were asked to read a brief tutorial giving examples of each class of galaxy, and then to correctly identify a set of ‘standard’ galaxies. …Those who correctly classified more than 11 of the 15 standards were allowed to proceed to the main part of the site. The bar to entry was kept deliberately low in order to attract as many classifiers to the site as possible” (Lintott et al. 3).
The majority of users classified around 30 galaxies. As the following chart shows, however, some went up to the tens of thousands. Even though the use of crowds to dissipate individual time obligations is the core purpose of such a system, it is very beneficial to accommodate the “super-users”, who do hundreds of times as much work as the casual user.
Links
Bad Astronomy: AAS #14: Galaxy zoo finds people are screwed up, not the Universe
Betsy Devine: Ox, Docs Shocks!
Colorado Rockies
May 4th, 2008

Ten Thousand Cents and the Normalizing Power of Crowds
April 18th, 2008
Back in January, when I demonstrated the Mechanical Turk to my Crowdsourcing students, I would show to them one particularly cryptic project. What it was was simply two boxes. The one on the left held an apparently zoomed-in image, while the one on the right was blank. With a simple brus
h, you were asked to redraw the image on the right. Colours were chosen with a colour dropper, and an adjustable ghost image in the right box made tracing easy. We all knew that we were creating a larger image, guessing it was an art project, but I did not think it could possibly turn out too effective.
I was wrong. The results of that project have surfaced, in the form of “Ten Thousand Cents“. TUrns out we were drawing a one hundred dollar bill.
The total labor cost to create the bill, the artwork being created, and the reproductions available for purchase are all $100. The work is presented as a video piece with all 10,000 parts being drawn simultaneously. The project explores the circumstances we live in, a new and uncharted combination of digital labor markets, “crowdsourcing,” “virtual economies,” and digital reproduction.
This project serves as a brilliant metaphor of the normalizing power of crowds. When you open up a project to the masses, governance becomes extremely difficult. Anybody is given the ability to contribute erroneous information. However, as you gain a larger community of contributors, things balance out despite the fouls. Consider opinion-based efforts, such as Digg and Travelocity: eventually, the best items shine through. That is why Wikipedia is so reliable considering the circumstances: because thousands of editors are better than one. So how is Ten Thousand Cents relevant?
Still Ben, right?
Dolores Labs: Crowdsourcing Matures Into A Skill
March 31st, 2008
Dolores Labs is a new service that help clients crowdsource their projects online. Specializing in Mechanical Turk, Dolores Labs has put online two fun example studies.
The first is a classification of Sports Illustrated covers over the past thirty years. Covers were classified by race of the athletes featured and the sport featured. Having recently led coding for a school study—involving a 2-week census of Digg.com front page stories— I can certainly appreciate how appropriate the Turk is for coding with such straightforward, reliable variables.
The second example is even more fascinating. Providing Turkers with thousands of random colours, Dolores simply asked each colour to be named by the worker. What resulted is a fascinating dataset of human-interpreted colour descriptions. You see the common colour names pop up, but more interesting are how the workers utilized language to describe those words that were more difficult to classify.
Essentially, Dolores Labs is a crowdsourcing consulting company. Even though they provide deeper services than simply advice, their main commodity is the knowledge of how crowdsourcing works. There are good ways to mobilize crowds and incorrect or useless ways to do so, and as we come to realize that, crowdsourcing moves beyond simply a trend and into a bona-fida tool. The existence of a group that specializes in understanding the process shows a maturing of crowdsourcing within culture as a viable method for abstract analysis.
Dolores Labs aren’t even the first ones selling their expertise on crowdsourcing. Amsterdam-based CreativeCrowds have been doing a similar thing for a while. Like Delores Labs, they also give back to the public, not in the form of test data but in their phenomenal blog, CrowdSourcingDirectory. Both companies are approaching this the right way, and I hope to see more from both in the future.



