Tuesday, March 26, 2013

"With Big Data, we are creating artificial intelligences that no human can understand"

From Quartz:

The basis for algorithm's predictions may be beyond understanding for the average human

Excerpted from BIG DATA: A Revolution That Will Transform How We Live, Work, and Think by Viktor Mayer-Schönberger, Kenneth Cukier.
Computer systems currently base their decisions on rules they have been explicitly programmed to follow. Thus when a decision goes awry, as is inevitable from time to time, we can go back and figure out why the computer made it. For example, we can investigate questions like “Why did the autopilot system pitch the plane five degrees higher when an external sensor detected a sudden surge in humidity?” Today’s computer code can be opened and inspected, and those who know how to interpret it can trace and comprehend the basis for its decisions, no matter how complex.

With big-data analysis, however, this traceability will become much harder. The basis of an algorithm’s predictions may often be far too intricate for the average human to understand.

When computers were explicitly programmed to follow sets of instructions, as with IBM’s early translation program of Russian to English in 1954, a human could readily grasp why the software substituted one word for another. But Google Translate incorporates billions of pages of translations into its judgments about whether the English word “light” should be “lumière” or “léger” in French (that is, whether the word refers to brightness or to weight). It’s impossible for a human to trace the precise reasons for the program’s word choices because they are based on massive amounts of data and vast statistical computations.

Big data operates at a scale that transcends our ordinary understanding. For example, the correlation Google identified between a handful of search terms and the flu was the result of testing 450 million mathematical models. In contrast, Cynthia Rudin initially designed 106 predictors for whether a manhole might catch fire, and she could explain to Con Edison’s managers why her program prioritized inspection sites as it did. “Explainability,” as it is called in artificial intelligence circles, is important for us mortals, who tend to want to know why, not just what. But what if instead of 106 predictors, the system automatically generated a whopping 601 predictors, the vast majority of which had very low weightings but which, when taken together, improved the model’s accuracy? The basis for any prediction might be staggeringly complex. What could she tell the managers then to convince them to reallocate their limited budget?...MORE