Old Slang: Appreciating Webster’s with Bots

The richness of language can be under-appreciated because of its  mundane nature. James Somers’s essay You’re probably using the wrong dictionary recently turned me on to old dictionaries, which – with colorful descriptions and honest uncertainties – gratify much more than what we’ve come to expect of dictionaries. While modern dictionaries give you matter-of-fact descriptions of words you don’t know, older dictionaries have a vivid, more exciting style that is equally likely to enlighten you about words you do know. Tracking down references made by John McPhee about his own dictionary, Somers recommends Webster’s Revised Unabridged 1913 dictionary.

Reading Webster’s 1913 is a satisfying exercise. What strikes me most, however, are the descriptions of slang, colloquialisms, and vulgarities. These are terms or uses which are informal, conversational; the dictionary’s etymology for slang notes its roots in ‘having no just reason for being.’ With these entries, a work now seen as a record of American English is defining language which, by its own description, is “unauthorized”.

The tension results in a wonderful series of entries, some that are very familiar to us:

Continue reading

Add user pseudonyms in data analysis

When analyzing anonymous user data in a team, I often take an extra step to help discussion: converting user identifiers to popular English name pseudonyms.

Pseudonyms tend to make the data more welcoming to team members that aren’t working directly with it, and helps you follow trends and outliers. It also helps in your visual sanity checks during analysis: names are simply easier to remember, thus helping you spot problems when inspecting the data.

Popular baby names are readily provided by the Social Security office, and I usually keep a derivative text list handy. In the simplest case, you can simply convert each unique id into a name. When I want to safeguard against name assignments changing as the data changes, I’ll save the ID>Name conversions in a basic CSV.

Below is a very basic example written in R to show how easy it is to do:

Low-Effort Crowdsourcing

Sentence generation with choice-based typing. The program prompts a user to choose one of two words that are likely to come after the previous words, allowing them to generate a whole sentence by low-effort interaction.—programmed by Jeff Bigham

How small can a crowdsourcing contribution be?

At November’s CrowdCamp workshop, a group of us got together and prototyped a number of sample systems to see how low-effort crowdsourcing would work. We posted a report at Follow the Crowd.

Our prototypes were silly at times, but helped us think about the mixture of low-effort input methods and non-distracting user contexts where low-effort crowdsourcing would work.

The ideas we prototyped, available at Github, include:

  • A binary tweeting interface, that lets you type sentences using a choice between common words
  • A passive image voting interface that captures a user’s smile as a ‘like’
  • A browser extension proof-of-concept that lets a worker complete tasks while a page is load
  • A hot-or-not style interface for choosing the better of two choices. The twist is that you’re choosing using affirmative grunts, so you can play it while listening (or pretending to listen?) to somebody!

Uh-huh. Yeah.

The emotive voting interface ‘likes’ an image if you smile while the image is on the screen, and ‘dislikes’ if you frown.

Details at Follow the Crowd. Team was Jeff Bigham, Kotaro Hara, Rajan Vaish,  Haoqi Zhang, and myself.