3 Strengths and Weaknesses of the Watson Computer on 'Jeopardy!'
Tonight, the first-of-three "Jeopardy!" programs airs that chronicles the showdown between an IBM computer named Watson and the quiz show's top human competitors of all time, Ken Jennings and Brad Rutter.
Heading into the contest, Watson does have some clear advantages over its challengers, but then again, the machine has a few significant vulnerabilities that might make all the difference.
Here is a rundown of three of the key strengths and weaknesses of a metals-and-wires quiz show contestant compared to flesh-and-blood players.
— Watson cannot "forget" anything. The machine's memory banks are stuffed with approximately 200 million pages of references, including encyclopedias, thesauri, full texts of books, plays, screenplays and much more. IBM programmers also fed Watson tens of thousands of previous "Jeopardy!" clues so it could learn how to handle the variously tricky answer types.
Notionally, Watson's game prep is not dissimilar to a human player cramming for a match, although Watson can bone up with a lot more practice questions and "store" far more source material than a person. However, it should be noted that Watson has not had decades of learning and living under its belt (the project began four years ago), so perhaps it's only fair that the machine gets to make up for lost time, so to speak.
But Watson's real advantage: Unlike our weirdly porous, mutative memories, the computer will not transpose details and dates, nor will it forget stuff that it once knew, such as high school trigonometry, to give one all-too-human example.
— Watson has no emotions. In sharp distinction to its human competitors, Watson cannot get rattled if it answers a clue wrong or craters its cash score by blowing a big Daily Double.
The computer also doesn't have to deal with any pre- or in-show jitters — after all, going on national TV in an attempt to prove you're smart and win money can be a bit nerve-wracking.
Of course, Jennings and Rutter are seasoned veterans of the quiz show, and neither is likely hurting in the financial department, so this might not be a huge point in Watson's favor in this particular contest.
— Watson is a statistical beast. It could be said in fairness that Watson doesn't really "know" anything — its knowledge is based on statistics, not learned factoids. When searching for the answer to a clue, Watson dissects the clue word-by-word and performs searches through its memory banks for any evidence of these words and analyzes the context in which they appear.
Admittedly, this doesn’t sound very different from how neuroscientists analogize our brains to work; however, compared to us, Watson can plow through considerably more material, such as the text of all of Shakespeare's play — and in their complete, original form, unlike the fragments we'd recall — within a second or two, courtesy of its nearly 3,000 processors running in parallel.
Having run the clue's words through hundreds of algorithms, Watson weighs its statistical confidence in possible answers, and a threshold must be crossed before the machine risks buzzing in and answering incorrectly. Overall, the statistical basis for Watson's "Jeopardy!" playing is rigorous and robust in a way we cannot equal. However….
— Watson has no sense of humor. "Jeopardy!" is, at its core, meant to entertain, and accordingly its categories and clues often involve humorous aspects, such as puns, riddles and amusing idioms, not to mention frequent pop cultural references.
Figuring out the actual meanings of phrases in this so-called "natural language" that people use has been singled out as the greatest challenge Watson's programmers faced. And as far as Watson has come, it still must grapple with the language of the clue while human contestants, meanwhile, can focus more on reaching for the answer.
An example of this humorous natural language that might be harder on Watson than a person is the category "Nice 'AB's!" that appeared on the Nov. 1, 2010, show, in which all answers would contain the paired letters "ab." An example clue: "It means detestable or loathsome, though I have no beef with the snowman, myself." Most of us will come up with "abominable" without much difficulty, but Watson has the disadvantage of having to sort out what is meant by "beef." (Still, by looking up "detestable" in a thesaurus and cross-referencing with "snowman," Watson might indeed arrive at the correct answer quickly.)
For the sake of further example, the following strangely spelled clue also appeared in this same category: "Heeeeey, this job title for the head of a monasteryi." (Answer: abbot, with an Abbott and Costello reference.)
— Watson cannot time its buzz-in to the reading of a clue or "buy time." While playing "Jeopardy!" contestants can buzz in when a light goes on after host Alex Trebek stops reading the clue (click early on your button and you're penalized a crucial quarter-second). Watson does not have speech recognition software that allows it to "hear" Trebek; instead, clues are transmitted electronically to the machine, so it gets to "see" a clue the same time that a contestant does.
Due to concerns that Watson could beat humans to the buzzer, IBM hooked Watson up to a signaling device that requires a physical pressing of a button, the same as human players. Ultimately in this setup, however, the advantage goes to us air-breathers because we can time our buzz-in to Trebek's clue reading.
"The reaction time of a machine is faster than a human, but that's not what matters in "Jeopardy!" — what matters is timing," said IBM's David Gondek, a researcher on the Watson project. "The best players read the clue, try to determine when Alex will finish reading, and then buzz in." Gondek said that "Watson is a strong buzzer, but Watson is not anticipating Alex's reading, it is reacting to a signal. A human can come in before Watson."
Also, as mentioned above, Watson is not programmed to buzz in until it has high confidence in an answer; a human can buzz in, though, and by rule have five seconds to work with before having to provide an answer. Human players, in other words, can take a risk and buy themselves time to come up with an answer, which is not a bad gambit if the category is an area of familiarity. Watson does not have this strategic luxury.
— Watson has never lived, played in or explored the real, physical world. Watson cannot implicitly draw upon real-life experiences — it is a bunch of servers in a room.
Commonsensical, prepositional relationships that we take for granted might momentarily boggle Watson. We know if we look down from a roof, we see the ground, or a gutter, perhaps; Watson does not have it that easy.
Sense-related questions involving smell, touch or taste, for example, should also theoretically take Watson a bit longer to parse than us because we can directly understand "stinky," "cold" or "spicy."
Watson, like us, is capable of learning, as its creators have made clear. When a certain algorithm leads to a correct answer, Watson leans on that algorithm a tad more moving forward. Given enough time, maybe even Watson will digitally salivate, as it were, at this clue from an October 2008 show: "This spicy "bread" that's flavored with molasses is often used to make a man or even a house." (Answer: gingerbread.)