The argument from statistics According to one school of statisticians, a single simple formula underlies all learning. Bayes’ theorem, as the formula is known, tells you how to
9780465065707-text.indd 31 7/16/15 12:44 PM
32 | THE MASTER ALGORITHM
update your beliefs whenever you see new evidence. A Bayesian learner starts with a set of hypotheses about the world. When it sees a new piece of data, the hypotheses that are compatible with it become more likely, and the hypotheses that aren’t become less likely (or even impossible). After seeing enough data, a single hypothesis dominates, or a few do. For example, if I’m looking for a program that accurately predicts stock movements and a stock that a candidate program had predicted would go up instead goes down, that candidate loses credibility. After I’ve reviewed a number of candidates, only a few credible ones will remain, and they will encapsulate my new knowledge of the stock market. Bayes’ theorem is a machine that turns data into knowledge. According to Bayesian statisticians, it’s the only correct way to turn data into knowledge. If they’re right, either Bayes’ theorem is the Master Algorithm or it’s the engine that drives it. Other statisticians have serious reservations about the way Bayes’ theorem is used and prefer different ways to learn from data. In the days before computers, Bayes’ theorem could only be applied to very simple problems, and the idea of it as a universal learner would have seemed far-fetched. With big data and big computing to go with it, however, Bayes can find its way in vast hypothesis spaces and has spread to every conceivable field of knowledge. If there’s a limit to what Bayes can learn, we haven’t found it yet.
The argument from computer science When I was a senior in college, I wasted a summer playing Tetris, a highly addictive video game where variously shaped pieces fall from above and which you try to pack as closely together as you can; the game is over when the pile of pieces reaches the top of the screen. Little did I know that this was my introduction to NP-completeness, the most important problem in theoretical computer science. Turns out that, far from an idle pursuit, mastering Tetris—really mastering it—is one of the most useful things you could ever do. If you can solve Tetris, you can solve thousands of the hardest and most important problems in science, technology, and management—all in one fell swoop. That’s
9780465065707-text.indd 32 7/16/15 12:44 PM
THE MASTER ALGORITHM | 33
because at heart they are all the same problem. This is one of the most astonishing facts in all of science. Figuring out how proteins fold into their characteristic shapes;
reconstructing the evolutionary history of a set of species from their DNA; proving theorems in propositional logic; detecting arbitrage opportunities in markets with transaction costs; inferring a three- dimensional shape from two-dimensional views; compressing data on a disk; forming a stable coalition in politics; modeling turbulence in sheared flows; finding the safest portfolio of investments with a given return, the shortest route to visit a set of cities, the best layout of components on a microchip, the best placement of sensors in an ecosystem, or the lowest energy state of a spin glass; scheduling flights, classes, and factory jobs; optimizing resource allocation, urban traffic flow, social welfare, and (most important) your Tetris score: these are all NP- complete problems, meaning that if you can efficiently solve one of them you can efficiently solve all problems in the class NP, including each other. Who would have guessed that all these problems, superficially so different, are really the same? But if they are, it makes sense that one algorithm could learn to solve all of them (or, more precisely, all efficiently solvable instances). P and NP are the two most important classes of problems in computer science. (The names are not very mnemonic, unfortunately.) A problem is in P if we can solve it efficiently, and it’s in NP if we can efficiently check its solution. The famous P = NP question is whether every efficiently checkable problem is also efficiently solvable. Because of NP-completeness, all it takes to answer it is to prove that one NP- complete problem is efficiently solvable (or not). NP is not the hardest class of problems in computer science, but it’s arguably the hardest “realistic” class: if you can’t even check a problem’s solution before the universe ends, what’s the point of trying to solve it? Humans are good at solving NP problems approximately, and conversely, problems that we find interesting (like Tetris) often have an “NP-ness” about them. One definition of artificial intelligence is that it consists of finding heuristic solutions to NP- complete problems. Often, we do this by reducing them to satisfiability,
9780465065707-text.indd 33 7/16/15 12:44 PM
34 | THE MASTER ALGORITHM
the canonical NP-complete problem: Can a given logical formula ever be true, or is it self-contradictory? If we invent a learner that can learn to solve satisfiability, it has a good claim to being the Master Algorithm. NP-completeness aside, the sheer existence of computers is itself a powerful sign that there is a Master Algorithm. If you could travel back in time to the early twentieth century and tell people that a soon-tobe-invented machine would solve problems in every realm of human endeavor—the same machine for every problem—no one would believe you. They would say that each machine can only do one thing: sewing machines don’t type, and typewriters don’t sew. Then in 1936 Alan Turing imagined a curious contraption with a tape and a head that read and wrote symbols on it, now known as a Turing machine. Every conceivable problem that can be solved by logical deduction can be solved by a Turing machine. Furthermore, a so-called universal Turing machine can simulate any other by reading its specification from the tape—in other words, it can be programmed to do anything. The Master Algorithm is for induction, the process of learning, what the Turing machine is for deduction. It can learn to simulate any other algorithm by reading examples of its input-output behavior. Just as there are many models of computation equivalent to a Turing machine, there are probably many different equivalent formulations of a universal learner. The point, however, is to find the first such formulation, just as Turing found the first formulation of the general-purpose computer.