Transparency in artificial intelligence

I remember watching with interest more than a decade ago as IBM’s Deep Blue took on Gary Kasparov. Now, IBM has embarked on a new venture in artificial intelligence, an open domain question-answering system they call Watson. Watson is far more than a database; Watson is about parsing questions posed in natural language, retrieving information, and generating and testing hypotheses to answer those questions. Watson’s big debut will come next week, when he participates in the first man-machine Jeopardy tournament. While it may be just a game show, for IBM, it represents the first real test of the technology, and a chance to introduce the public to a technology that IBM thinks may revolutionize how we use computers and how we answer questions:

“I want to create a medical version of this,” he [John Kelly, IBM’s head of research] adds. “A Watson M.D., if you will.” He imagines a hospital feeding Watson every new medical paper in existence, then having it answer questions during split-second emergency-room crises. “The problem right now is the procedures, the new procedures, the new medicines, the new capability is being generated faster than physicians can absorb on the front lines and it can be deployed.” He also envisions using Watson to produce virtual call centers, where the computer would talk directly to the customer and generally be the first line of defense, because, “as you’ve seen, this thing can answer a question faster and more accurately than most human beings.”

But how does Watson work? What makes Watson tick, and how does he really perform? As the New York Times reported in June, IBM’s not saying:

Ferrucci refused to talk on the record about Watson’s blind spots. He’s aware of them; indeed, his team does “error analysis” after each game, tracing how and why Watson messed up. But he is terrified that if competitors knew what types of questions Watson was bad at, they could prepare by boning up in specific areas. I.B.M. required all its sparring-match contestants to sign nondisclosure agreements prohibiting them from discussing their own observations on what, precisely, Watson was good and bad at. I signed no such agreement, so I was free to describe what I saw; but Ferrucci wasn’t about to make it easier for me by cataloguing Watson’s vulnerabilities.

But this raises an important question: what if Watson is wrong? A game of Jeopardy is one thing, but what if we ask Watson a tough question, one that we don’t have the answer to ourselves, and he’s wrong? What if that results in substantial real-world consequences, like a patient dying or a megaproject failing? And what if Watson was wrong not because he didn’t have enough data, or the right data, but because of a simple bug?

If I am going to bet my life on a medical decision made by Watson, or stake the future of my business on a financial projection, I am going to want to see every last bit of data that went into that decision. I am going to want to know exactly how Watson came to the conclusion he did, and I am going to want those processes to be completely open for auditing. The transparency must extend through the entire system, from the input data, to the algorithms used to process it. IBM would probably tell us that even if we had access to Watson’s data and algorithms there’s nothing we could do with them, owing to the fact that we haven’t got petabytes of storage and a Blue Gene/P supercomputer lurking in the basement. But storage gets cheaper every day, and so does CPU time. There’s no reason that a major university, or consortium of universities, couldn’t build a slightly slower Watson. And it’s not just about being able to run your own Watson-clone; it’s about being able to inspect the algorithms that make it work.

Transparency in input data also extends to transparency in output; some have drawn parallels between Watson and Wolfram Research‘s Wolfram Alpha. From the end-user’s perspective, Wolfram Alpha is a black box. You don’t get to know precisely where the data comes from, nor how the answers are achieved. Worse, your rights to use the output generated by your queries are substantially limited by Wolfram Alpha’s Terms of Use. It might seem simple enough to just steer clear of the Wolfram Alpha web site, and thus avoid any entanglement with Wolfram Research’s lawyers, but the release of Mathematica 8 made the situation more complicated. Mathematica 8 includes built-in functions which will send a query off to Wolfram Alpha and then return the result to your Mathematica notebook. This includes not only queries of the type you might enter on the Wolfram Alpha site—things like “population of Idaho”—but also free-form statements which are translated into Mathematica syntax. It is, on the surface, brilliant. It brings us very close to a science-fiction future of being able to simply tell the computer to do what we want—to “blur the image”, or “add a light teal frame and orange grid lines”, to use Stephen Wolfram’s examples. But what they won’t tell you, unless you dig deep in the documentation, is that using those functions infects your Mathematica notebook with Wolfram Alpha output, owned by Wolfram Research and subject to the Wolfram Alpha terms of service.

There’s a Wolfram Alpha appliance, too, but even if you install one of those in your datacenter you still don’t really own it, and you certainly don’t know what’s going on inside. I would not be surprised if, like the Google Search Appliance, the Wolfram Alpha appliance was rigged with a tamper switch so that it (and Wolfram Research) would know if someone tried to take a peek inside.

What we need is a paradigm shift in how we treat these kinds of computational services; we must demand transparency throughout the process, and we must be clear about who owns what. Take this excerpt from the Terms of Use, for example:

The specific images, such as plots, typeset formulas, and tables, as well as the general page layouts, are all copyrighted by Wolfram|Alpha at the time Wolfram|Alpha generates them. A great deal of scholarship and innovation is included in the results generated and displayed by Wolfram|Alpha, including the presentations, collections, and juxtapositions of data, and the choices involved in formulating and composing mathematical results; these are also protected by copyright.

Adobe’s never tried to assert a claim of copyright over images I manipulate in Photoshop, nor has Microsoft ever tried to assert a claim of copyright over documents I prepare in Word or Excel. Some might say that that’s not comparable; there’s no scholarship in Word or Excel (which explains why Word and Excel documents look as poor as they do). But Donald Knuth doesn’t try to claim copyright over documents I’ve prepared using TeX, either, and a monumental amount of scholarship and innovation went into the design of TeX. Even if I were to profit massively off of a book typeset using TeX, Knuth still has no interest whatsoever. For me, Wolfram Research’s argument falls flat.

So, where does this leave Watson? IBM has yet to try to commercialize Watson, but when they do, there must be an open and frank discussion about these issues. I hope it will ultimately lead to a new, more transparent approach to the development of systems like Watson and Wolfram Alpha. And I hope that if we are expected to make vital decisions based on the output of these systems, that rather than being asked to take on faith that Watson or Wolfram Alpha works according to expectations, we will be given all of the data we need to confirm for ourselves that the system works as expected, and that we can indeed trust its results. Unlike Wolfram Alpha, Watson doesn’t yet generate novel output, just answers to questions. However, when Watson starts generating novel output, unless we want to grant it personhood (a legal quagmire) and assign copyright to it, it makes little sense to say that, as in the case of Wolfram Alpha, we should assign copyright to IBM, because they built Watson. No, copyright should be assigned, just as in the case of any other tool, to the person or organization who operates the tool. Watson may be novel—a “thinking” machine—but it is still just a computer program. We should not put blind faith in Watson without seeing the input (“garbage in, garbage out”), and we should not treat the output of Watson (or Wolfram Alpha, for that matter) as being somehow sacred in a way that the output of other software systems is not.