Over the past few months, you may have read a report on an article co-authored by Stephen Hawking that discussed the risks associated with artificial intelligence. The article suggests that AI can pose a serious danger to humanity. Hawking is not alone there — Elon Musk and Peter Thiel are both intelligent public figures who have expressed similar concerns (Thiel has invested more than $1.3 million researching the problem and possible solutions).

The coverage of Hawking’s article and Musk’s comments was, if not too subtly, a little hilarious. The tone was very much like «look at this weird thing that all these geeks are worried about». Little attention is given to the idea that if some of the smartest people on Earth are warning you that something could be very dangerous, it might be worth listening to.

This is understandable — the takeover of artificial intelligence by the world certainly sounds very strange and implausible, perhaps due to the huge attention that science fiction authors pay to this idea. So, what is it that frightened all these nominally sane, rational people so much?

What is intelligence?

To talk about the dangers of artificial intelligence, it can be helpful to understand what intelligence is. To better understand the problem, let’s take a look at a toy AI architecture used by researchers who study reasoning theory. This toy AI is called AIXI and has a number of useful properties. Its goals can be arbitrary, it scales well with processing power, and its internal structure is very simple and straightforward.

Also, you can implement simple, practical versions of the architecture that can, for example, play Pacman if you want. AIXI is the product of an artificial intelligence researcher named Markus Hutter, arguably the foremost authority on algorithmic intelligence. This is what he says in the video above.

AIXI is surprisingly simple: it has three main components: student , scheduler and utility function .

  • Student takes strings of bits that correspond to inputs about the outside world and looks through computer programs until it finds those that produce their observations as output. These programs together allow him to make assumptions about what the future will look like by simply executing each program ahead and weighting the probability of the result by the length of the program (an implementation of Occam’s razor).
  • Scheduler looks at the possible actions the agent might take and uses the learner module to predict what will happen if it takes each of them. It then evaluates them according to how good or bad the predicted outcomes are, and chooses the course of action that maximizes the quality factor of the expected outcome multiplied by the expected probability of achieving it.
  • The last module utility function is a simple program that takes a description of the future state of the world and computes a utility estimate for it. This utility measures how good or bad this outcome is and is used by the planner to estimate the future state of the world. The utility function can be arbitrary.
  • Taken together, these three components form optimizer which optimizes for a specific purpose, regardless of the world it’s in.

This simple model is the basic definition of an intelligent agent. The agent learns its environment, builds models of it, and then uses those models to find a course of action that maximizes the chances that it will get what it wants. AIXI is similar in structure to an AI that plays chess or other games with known rules, except that it can determine the rules of the game by playing it from knowledge level zero.

AIXI, given enough time to compute, can learn to optimize any system for any purpose, no matter how complex. It is generally an intelligent algorithm. Note that this is not the same as having human intelligence (AI inspired is a different topic, ). In other words, AIXI can outwit anyone at any intellectual task (given enough computing power), but he may not realize his victory.

head sculpture

As a hands-on AI, AIXI has many problems. First, he has no way to find those programs that produce the output of interest to him. It’s a brute force algorithm, which means it’s not practical unless you have a randomly generated computer. Any actual implementation of AIXI is by necessity an approximation, and (today) generally quite crude. However, AIXI gives us a theoretical idea of ​​what a powerful artificial intelligence could look like and how it could reason.

Value space

If you have already done computer programming you know computers are nasty, pedantic and mechanically literal. The machine doesn’t know or care what you want it to do: it only does what it’s been told to do. This is an important concept when it comes to artificial intelligence.

With that in mind, imagine that you have invented powerful artificial intelligence—you have come up with smart algorithms to generate hypotheses that fit your data and to come up with good candidate plans. Your AI can solve common problems and can do it efficiently on modern computer hardware.

Now it’s time to choose a useful feature that will determine what AI is worth. What should you ask it to evaluate? Remember that the machine will be obnoxious, meticulous about literally any function you ask it to maximize, and will never stop — there is no ghost in the machine that will ever «wake up» and decide to change its utility function, no matter how much boost efficiency it does on your own.

Eliezer Yudkowsky put it this way:

As with all computer programming, the main problem and significant difficulty with AGI is that if we write the wrong code, the AI ​​will not automatically review our code, flag errors, figure out what we really wanted to say, and do it. instead of. Non-programmers sometimes think of AGI, or computer programs in general, as the equivalent of a servant who obeys orders unconditionally. But it’s not that AI is absolutely obedient to your code; rather, AI it’s just code.

If you’re trying to run a factory and you tell a machine to value paper clip making and then give it control of a bunch of factory robots, you might come back the next day to find it’s exhausted all other raw materials, killed all your employees, and made paper clips. from their remains. If, in an attempt to correct your mistake, you reprogram a machine to simply make everyone happy, you may come back the next day to find that it is inserting wires into people’s brains.


The point is that humans have many complex values ​​that we assume are implicitly shared with other minds. We value money, but we value human life more. We want to be happy, but it doesn’t have to be wired into the brain. We don’t feel the need to clarify these things when we give instructions to other people. However, you cannot make such assumptions when designing a machine’s utility function. The best solutions within the soulless mathematics of a simple utility function are often morally appalling solutions that human beings would refuse.

Allowing an intelligent mechanism to maximize a naive utility function will almost always be disastrous. As Oxford philosopher Nick Bostom says,

We cannot safely assume that a superintelligence inevitably shares any ultimate values ​​stereotypically associated with human wisdom and intellectual development — scientific curiosity, benevolence towards others, spiritual enlightenment and contemplation, the rejection of material comprehension, a taste for refined culture, or simple pleasures in life, humility and selflessness, and so on.

To make matters worse, it is very, very difficult to come up with a complete and detailed list of everything that people value. There are many aspects to this question, and forgetting even one of them is potentially catastrophic. Even among those we know, there are subtleties and complexities that make it difficult to write them down as pure systems of equations that we can give to a machine as a utility function.

Some people, after reading this, come to the conclusion that creating AI with utility functions is a terrible idea, and we should just design them differently. The bad news here is that it can formally be proven that any agent that does not have something equivalent to a utility function cannot have consistent preferences about the future.

Recursive self-improvement

One solution to the aforementioned dilemma is to stop AI agents from harming people: give them only the resources they need to solve the problem the way you want it to be solved, monitor them closely, and keep them away. from the possibility of causing great harm. Unfortunately, our ability to control intelligent machines is highly questionable.

Even if they’re not much smarter than we are, there’s still room for a computer to «bootstrap» — build better hardware or improve its own code to make it even smarter. This could allow the machine to jump over human intelligence by many orders of magnitude, outsmart humans in the same way humans outsmart cats. This scenario was first proposed by a man named I.J. Good, who worked on the Enigma cryptanalysis project with Alan Turing during World War II. He called it an «intelligence explosion» and described the matter as follows:

Let a super-intelligent machine be defined as a machine that can far surpass the entire intellectual activity of any person, no matter how intelligent he may be. Since machine design is one of the intellectual activities, superintelligent machines can design even better machines; then, undoubtedly, there will be an «explosion of the intellect», and the intellect of man will be left far behind. Thus, the first superintelligent machine is the last invention that man would ever have to make, provided the machine is sufficiently docile.

It is not guaranteed that an intelligence explosion is possible in our universe, but it seems likely. Over time, computers become faster and gain the basic knowledge of increasing intelligence. This means that the need for resources to make this last transition to a general, rising intelligence level drops lower and lower. At some point, we will find ourselves in a world in which millions of people can find Best Buy and pick up the hardware and technical literature needed to create self-improving artificial intelligence, which, as we have already created, can be very dangerous. Imagine a world where you could make atomic bombs out of sticks and stones. This is the future we are discussing.

And, if a machine does make that leap, it can very quickly outpace the human species in terms of intellectual productivity, solving problems that a billion people cannot solve, in the same way that humans can solve problems that a billion cats can.” t.

He could create powerful robots (or bio or nanotech) and acquire the ability to change the world at will relatively quickly, and there’s not much we could do about it. Such intelligence could strip the Earth and the rest of the solar system of spare parts without too much trouble on the way to what we have said. It seems likely that such a development would be catastrophic for humanity. Artificial intelligence does not have to be evil to destroy the world, just catastrophically indifferent.

As the saying goes, “the machine doesn’t love or hate you, but you are made of atoms that it can use for other purposes.”

Risk assessment and mitigation

So, if we admit that designing powerful artificial intelligence that maximizes a simple utility function is bad, how much trouble are we really in? How much time do we have before it becomes possible to build such machines? This is, of course, difficult to say.

AI developers are making progress. The machines we build and the problems they can solve are steadily growing in scope. In 1997, Deep Blue could play chess at a level above a human grandmaster. In 2011, IBM’s Watson could read and synthesize enough information deeply and quickly enough to beat the best players in an open question-and-answer game of puns and puns—great progress in fourteen years.

Right now, Google is investing heavily in research on deep learning, a technique that allows you to create powerful neural networks by building chains of simpler neural networks. This investment allows him to make significant progress in speech and image recognition. Their latest acquisition in this area is a Deep Learning startup called DeepMind, for which they paid around $400 million. As part of the terms of the deal, Google agreed to create an ethics board to ensure the safe development of their AI technology.

neural network

At the same time, IBM is developing Watson 2.0 and 3.0, systems capable of processing images and video and advocating conclusions. They gave a simple, early demonstration of Watson’s ability to synthesize arguments for and against a topic in the demo video below. The results are imperfect, but despite this, an impressive step.

None of these technologies are dangerous in and of themselves: artificial intelligence as a field is still struggling to match the abilities mastered by young children. Computer programming and AI design is a very complex, high-level cognitive skill and will likely be the last human task that machines will master. Before we get to that point, we will also have ubiquitous machines that can drive. pursue medicine and law, and perhaps other things, with profound economic consequences.

The time it takes us to get to the tipping point of self-improvement depends only on how quickly we get good ideas. Predicting technological advances of this kind is notoriously difficult. It doesn’t seem unreasonable that we can build strong AI in twenty years, but it also doesn’t seem unreasonable that it could take eighty years. In any case, it will happen eventually, and there is reason to believe that when it does, it will be extremely dangerous.

So, if we accept that this will be a problem, what can we do about it? The answer is to make sure the first intelligent machines are secure so that they can boot to a significant level of intelligence, and then protect us from the insecure machines built later. This «security» is determined by the sharing of human values ​​and the desire to protect and help humanity.

Since we can’t really sit down and program human values ​​into a machine, we might need to develop a utility function that would require the machine to observe people, determine our values, and then try to maximize them. To make this development process secure, it can also be helpful to develop artificial intelligence that is specifically designed to not have preferences regarding their utility functions, allowing us to fix them or turn them off without resistance if they start to go astray during development.


Many of the problems we need to solve in order to build secure machine intelligence are mathematically difficult, but there is reason to believe that they can be solved. Several different organizations are working on this problem, including the Future of Humanity Institute at Oxford and the Machine Intelligence Research Institute (funded by Peter Thiel).

MIRI is particularly interested in developing the mathematics needed to create Friendly AI. If it turns out that self-booting AI is possible, then developing this kind of «Friendly AI» technology first, if successful, could be the most important thing humans have ever done.

Do you think artificial intelligence is dangerous? Are you worried about what the future of AI might bring? Share your thoughts in the comments section below!

Image Credits: Lwp Kommunikáció Via Flickr, «Neural Network», fdecomite, «img_7801», Steve Rainwater, «E-Volve», Keoni Cabral, «new_20x», Robert Cudmore, «Paper Clips», Clifford Wallace

Похожие записи