Rico's rants: Auto-correct doesn't

08 August 2012

Auto-correct doesn't

James Gleick,the author, most recently, of The Information: A History, a Theory, a Flood, has an article in The New York Times about common software issues:

I mention a certain writer in an email, and the reply comes back: Comcast McCarthy??? Phoner novelist??? Did I really type Comcast? No. The great god Autocorrect has struck again.
It is an impish god. I try retyping the name on a different device. This time the letters reshuffle themselves into Format McCarthy. Welcome to the club, Format. Meet the Danish astronomer Touchpad Brahe and the Franco-American actress Natalie Portmanteau.
In the past, we were responsible for our own typographical errors. Now Autocorrect has taken charge. This is no small matter. It is a step in our evolution— the grafting of silicon into our formerly carbon-based species, in the name of collective intelligence. Or unintelligence, as the case may be.
Earlier this year, the police in Hall County in Georgia locked down the West Hall schools for two hours after someone received a text message saying, “gunman be at west hall today.” The texter had typed “gunna,” but Autocorrect had a better idea.
Who’s the boss of our fingers? Cyberspace is awash with outrage. Even if hardly anyone knows exactly how it works or where it is, Autocorrect is felt to be haunting our cellphones or watching from the cloud.
Peter Sagal, the host of NPR’s Wait Wait ... Don’t Tell Me! complains via Twitter: Autocorrect changed ‘Fritos’ to ‘frites.’ Autocorrect is effete. Pass it on.
Its cultural status can be judged from the websites and blogs devoted to it, from the stream of whinging on Twitter, and from the appearance this summer of The New Yorker’s first Autocorrect cartoon. (A hot dog vendor dashes to the pitcher’s mound; the manager looks at his hand-held device and says: “Oh, I see what happened. Autocorrect changed ‘southpaw’ to ‘sauerkraut.’”)
Tweets the actor and author Stephen Fry: “Just typed ‘better than hanging around the house rating bisexuals’ to a friend. Thanks, Autocorrect. Meant ‘eating biscuits.’ ”
We are collectively peeved. People blast Autocorrect for mangling their intentions. And they blast Autocorrect for failing to un-mangle them.
I try to type geocentric, and discover that I have typed egocentric; is Autocorrect making a sort of cosmic joke? I want to address my tweeps (a made-up word, admittedly, but that’s what people do). No: I get “twerps.” Some pairings seem far apart in the lexicographical space. Cuticles becomes citified. Catalogues turns to fatalities and Iditarod to radiator. What is the logic?
The logic is hard to discern, and consistency is for hobgoblins. Sometimes Capistrano may become “vapid tramp”; next time maybe “campus tramp.” Kathryn Schulz, the author of Being Wrong, tweets in verse:
Super fans
sweaty fans
sweaty dreams
sweet dreams.
Autocorrect train wreck over here.
Actually, an assortment of competing algorithms is at work. Autocorrect is not a single entity but a hodgepodge, from different vendors, chief among them Apple, Google, and Microsoft. All their algorithms start with the low-hanging fruit. They know what to do when you type “hte”. After that, their goals vary, and so do their capabilities. On most devices and applications, Autocorrect can be switched off, for those who prefer to go naked. It’s not always easy to find the switch. On mobile phones, where our elephant thumbs tramp across tiny keypads, the idea is to free us from backtracking and drudgery. The iPhone’s Autocorrect function loves to insert apostrophes. You can rely on it: type “dont” and get “don’t.” Type “cant” and get “can’t”, but is that what you wanted? Autocorrect is just playing the odds. Even “ill” turns to “I’ll” and “id” to “I’d” (sorry, Dr. Freud).
When Autocorrect can reach out from the local device or computer to the cloud, the algorithms get much, much smarter. I consulted Mark Paskin, a longtime software engineer on Google’s search team. Where a mobile phone can check typing against a modest dictionary of words and corrections, Google uses no dictionary at all.
“A dictionary can be more of a liability than you might expect,” Paskin says. “Dictionaries have a lot of trouble keeping up with the real world, right?” Instead Google has access to a decent subset of all the words people type— “a constantly evolving list of words and phrases,” he says; “the parlance of our times.”
If you type “kofee” into a search box, Google would like to save a few milliseconds by guessing whether you’ve misspelled the caffeinated beverage or the former United Nations secretary-general. It uses a probabilistic algorithm with roots in work done at AT&T Bell Laboratories in the early 1990s. The probabilities are based on a “noisy channel” model, a fundamental concept of information theory. The model envisions a message source— an idealized user with clear intentions— passing through a noisy channel that introduces typos by omitting letters, reversing letters, or inserting letters.
“We’re trying to find the most likely intended word, given the word that we see,” Paskin says. Coffee is a fairly common word, so with the vast corpus of text the algorithm can assign it a far higher probability than Kofi. On the other hand, the data show that spelling coffee with a K is a relatively low-probability error. The algorithm combines these probabilities. It also learns from experience and gathers further clues from the context.
The same probabilistic model is powering advances in translation and speech recognition, comparable problems in artificial intelligence. In a way, to achieve anything like perfection in one of these areas would mean solving them all; it would require a complete model of human language. But perfection will surely be impossible. We’re individuals. We’re fickle; we make up words and acronyms on the fly, and sometimes we scarcely even know what we’re trying to say.
One more thing to worry about: the better Autocorrect gets, the more we will come to rely on it. It’s happening already. People who yesterday unlearned arithmetic will soon forget how to spell. One by one we are outsourcing our mental functions to the global prosthetic brain.
I can live with that. We do it with memory, we do it with navigation, so what the he’ll, let’s do it with spelling.

Rico says he can spell quite well on his own, thank you. Autocorrect is, undoubtedly, responsible for the complaints he gets from readers about misspellings in his Rant...