When Bots Teach Themselves to Cheat

From Wired:

Once upon a time, a bot deep in a game of tic-tac-toe figured out that making improbable moves caused its bot opponent to crash. Smart. Also sassy.

Moments when experimental bots go rogue—some would call it cheating—are not typically celebrated in scientific papers or press releases. Most AI researchers strive to avoid them, but a select few document and study these bugs in the hopes of revealing the roots of algorithmic impishness. “We don’t want to wait until these things start to appear in the real world,” says Victoria Krakovna, a research scientist at Alphabet’s DeepMind unit. Krakovna is the keeper of a crowdsourced list of AI bugs. To date, it includes more than three dozen incidents of algorithms finding loopholes in their programs or hacking their environments.

The specimens collected by Krakovna and fellow bug hunters point to a communication problem between humans and machines: Given a clear goal, an algorithm can master complex tasks, such as beating a world champion at Go. But even with logical parameters, it turns out that mathematical optimization empowers bots to develop shortcuts humans didn’t think to deem off-­limits. Teach a learning algorithm to fish, and it might just drain the lake.

Gaming simulations are fertile ground for bug hunting. Earlier this year, researchers at the University of Freiburg in Germany challenged a bot to score big in the Atari game Qbert. Instead of playing through the levels like a sweaty-palmed human, it invented a complicated move to trigger a flaw in the game, unlocking a shower of ill-gotten points. “Today’s algorithms do what you say, not what you meant,” says Catherine Olsson, a researcher at Google who has contributed to Krakovna’s list and keeps her own private zoo of AI bugs.

These examples may be cute, but here’s the thing: As AI systems become more powerful and pervasive, hacks could materialize on bigger stages with more consequential results. If a neural network managing an electric grid were told to save energy—DeepMind has considered just such an idea—it could cause a blackout.

“Seeing these systems be creative and do things you never thought of, you recognize their power and danger,” says Jeff Clune, a researcher at Uber’s AI lab. A recent paper that Clune coauthored, which lists 27 examples of algorithms doing unintended things, suggests future engineers will have to collaborate with, not command, their creations. “Your job is to coach the system,” he says. Embracing flashes of artificial creativity may be the solution to containing them.

. . . .

  • Infanticide: In a survival simulation, one AI species evolved to subsist on a diet of its own children.

. . . .

  • Optical Illusion: Humans teaching a gripper to grasp a ball accidentally trained it to exploit the camera angle so that it appeared successful—even when not touching the ball.

Link to the rest at Wired

4 thoughts on “When Bots Teach Themselves to Cheat”

  1. I have one question, maybe two, maybe one question in two parts, for the AI crowd.

    Is the software changing its own code? Or is it changing its working parameters; that is, data?

    • See “JustAReader” below – excellent points.

      However, as a single system, there are only a few (highly experimental, and not very successful to date) AIs that modify their own code.

      Most AIs are based on “weighting.” In order to reach their goal, they tweak a set of weights – data – that tell them which already existing code should be executed (or, more commonly, change the output of functions that take those weights as their input). They compare the result of the change to their goal, and discard changes that do not progress towards that goal. (Note, the tweaks can come from an algorithm that is coded into the AI, or from a pseudo-random number generator, or both.) Pretty much, this is the way that biological intelligence works – burn your hand on a stove, and that weight goes to zero – follow a novel line of thinking that results in a more correct view of, say, supernova physics, and that weight gets an increase.

      The main problem in AI is defining the goal properly. Bad and/or incomplete definitions of the goal lead to bad behavior. This is also very much in line with biobrains – a feral animal has different goals than a domesticated one – just as a sociopath or psychopath has different goals than the sane.

  2. The distinction is not really a clean one. Even in traditional, non-AI applications, one program’s code is another program’s data. Programmers use editor programs to write source code. Compilers treat that source code as input data and emit the machine code as output data. Operating systems move that machine code around as data and decide when it will be launched.

    Many application programs are best viewed as layers of “machines” that each take as data the next-more-abstract level of code and carry out the instructions in that code.
    You might think a web page is just a page of text and pictures, but to a web browser it is a set of instructions to be followed, in other words, a program. You might think that PDF file contains a document, but PDF is really a programming language that is executed by a printer or PDF viewer. (I’ve coded in “raw” PDF — it’s not fun.) In fact PDF is a perfect example of program code that is usually written by other other programs without direct inspection by a human programmer.

    In the layered view of an application, learning a new set of data parameters in layer K that will be then used to control future behaviors of layer K+1 is pretty much the same thing as learning a new behavior for layer K+1. It all depends on what layer of the software you are looking at in any given moment.

Comments are closed.