Over the past few months, you may have read a report on an article co-authored by Stephen Hawking that discussed the risks associated with artificial intelligence. The article suggests that AI can pose a serious danger to humanity. Hawking is not alone there — Elon Musk and Peter Thiel are both intelligent public figures who have expressed similar concerns (Thiel has invested more than $1.3 million researching the problem and possible solutions).
The coverage of Hawking’s article and Musk’s comments was, if not too subtly, a little hilarious. The tone was very much like «look at this weird thing that all these geeks are worried about». Little attention is given to the idea that if some of the smartest people on Earth are warning you that something could be very dangerous, it might be worth listening to.
This is understandable — the takeover of artificial intelligence by the world certainly sounds very strange and implausible, perhaps due to the huge attention that science fiction authors pay to this idea. So, what is it that frightened all these nominally sane, rational people so much?
What is intelligence?

To talk about the dangers of artificial intelligence, it can be helpful to understand what intelligence is. To better understand the problem, let’s take a look at a toy AI architecture used by researchers who study reasoning theory. This toy AI is called AIXI and has a number of useful properties. Its goals can be arbitrary, it scales well with processing power, and its internal structure is very simple and straightforward.
Also, you can implement simple, practical versions of the architecture that can, for example, play Pacman if you want. AIXI is the product of an artificial intelligence researcher named Markus Hutter, arguably the foremost authority on algorithmic intelligence. This is what he says in the video above.
AIXI is surprisingly simple: it has three main components: student , scheduler and utility function .
- Student takes strings of bits that correspond to inputs about the outside world and looks through computer programs until it finds those that produce their observations as output. These programs together allow him to make assumptions about what the future will look like by simply executing each program ahead and weighting the probability of the result by the length of the program (an implementation of Occam’s razor).
- Scheduler looks at the possible actions the agent might take and uses the learner module to predict what will happen if it takes each of them. It then evaluates them according to how good or bad the predicted outcomes are, and chooses the course of action that maximizes the quality factor of the expected outcome multiplied by the expected probability of achieving it.
- The last module utility function is a simple program that takes a description of the future state of the world and computes a utility estimate for it. This utility measures how good or bad this outcome is and is used by the planner to estimate the future state of the world. The utility function can be arbitrary.
- Taken together, these three components form optimizer which optimizes for a specific purpose, regardless of the world it’s in.
This simple model is the basic definition of an intelligent agent. The agent learns its environment, builds models of it, and then uses those models to find a course of action that maximizes the chances that it will get what it wants. AIXI is similar in structure to an AI that plays chess or other games with known rules, except that it can determine the rules of the game by playing it from knowledge level zero.
AIXI, given enough time to compute, can learn to optimize any system for any purpose, no matter how complex. It is generally an intelligent algorithm. Note that this is not the same as having human intelligence (AI inspired is a different topic, ). In other words, AIXI can outwit anyone at any intellectual task (given enough computing power), but he may not realize his victory.
As a hands-on AI, AIXI has many problems. First, he has no way to find those programs that produce the output of interest to him. It’s a brute force algorithm, which means it’s not practical unless you have a randomly generated computer. Any actual implementation of AIXI is by necessity an approximation, and (today) generally quite crude. However, AIXI gives us a theoretical idea of what a powerful artificial intelligence could look like and how it could reason.
Value space
If you have already done computer programming you know computers are nasty, pedantic and mechanically literal. The machine doesn’t know or care what you want it to do: it only does what it’s been told to do. This is an important concept when it comes to artificial intelligence.
With that in mind, imagine that you have invented powerful artificial intelligence—you have come up with smart algorithms to generate hypotheses that fit your data and to come up with good candidate plans. Your AI can solve common problems and can do it efficiently on modern computer hardware.
Now it’s time to choose a useful feature that will determine what AI is worth. What should you ask it to evaluate? Remember that the machine will be obnoxious, meticulous about literally any function you ask it to maximize, and will never stop — there is no ghost in the machine that will ever «wake up» and decide to change its utility function, no matter how much boost efficiency it does on your own.
Eliezer Yudkowsky put it this way:
As with all computer programming, the main problem and significant difficulty with AGI is that if we write the wrong code, the AI will not automatically review our code, flag errors, figure out what we really wanted to say, and do it. instead of. Non-programmers sometimes think of AGI, or computer programs in general, as the equivalent of a servant who obeys orders unconditionally. But it’s not that AI is absolutely obedient to your code; rather, AI — it’s just code.
If you’re trying to run a factory and you tell a machine to value paper clip making and then give it control of a bunch of factory robots, you might come back the next day to find it’s exhausted all other raw materials, killed all your employees, and made paper clips. from their remains. If, in an attempt to correct your mistake, you reprogram a machine to simply make everyone happy, you may come back the next day to find that it is inserting wires into people’s brains.
The point is that humans have many complex values that we assume are implicitly shared with other minds. We value money, but we value human life more. We want to be happy, but it doesn’t have to be wired into the brain. We don’t feel the need to clarify these things when we give instructions to other people. However, you cannot make such assumptions when designing a machine’s utility function. The best solutions within the soulless mathematics of a simple utility function are often morally appalling solutions that human beings would refuse.
Allowing an intelligent mechanism to maximize a naive utility function will almost always be disastrous. As Oxford philosopher Nick Bostom says,