A bot that watched 70,000 hours of Minecraft could unlock AI’s next big thing

From MIT Technology Review:

OpenAI has built the best Minecraft-playing bot yet by making it watch 70,000 hours of video of people playing the popular computer game. It showcases a powerful new technique that could be used to train machines to carry out a wide range of tasks by binging on sites like YouTube, a vast and untapped source of training data.

The Minecraft AI learned to perform complicated sequences of keyboard and mouse clicks to complete tasks in the game, such as chopping down trees and crafting tools. It’s the first bot that can craft so-called diamond tools, a task that typically takes good human players 20 minutes of high-speed clicking—or around 24,000 actions.

The result is a breakthrough for a technique known as imitation learning, in which neural networks are trained how to perform tasks by watching humans do them. Imitation learning can be used to train AI to control robot arms, drive cars or navigate webpages.  

There is a vast amount of video online showing people doing different tasks. By tapping into this resource, the researchers hope to do for imitation learning what GPT-3 did for large language models. “In the last few years we’ve seen the rise of this GPT-3 paradigm where we see amazing capabilities come from big models trained on enormous swathes of the internet,” says Bowen Baker at OpenAI, one of the team behind the new Minecraft bot. “A large part of that is because we’re modeling what humans do when they go online.”

The problem with existing approaches to imitation learning is that video demonstrations need to be labeled at each step: doing this action makes this happen, doing that action makes that happen, and so on. Annotating by hand in this way is a lot of work, and so such datasets tend to be small. Baker and his colleagues wanted to find a way to turn the millions of videos that are available online into a new dataset.

The team’s approach, called Video Pre-Training (VPT), gets around the bottleneck in imitation learning by training another neural network to label videos automatically. They first hired crowdworkers to play Minecraft, and recorded their keyboard and mouse clicks alongside the video from their screens. This gave the researchers 2000 hours of annotated Minecraft play, which they used to train a model to match actions to onscreen outcome. Clicking a mouse button in a certain situation makes the character swing its axe, for example.  

The next step was to use this model to generate action labels for 70,000 hours of unlabelled video taken from the internet and then train the Minecraft bot on this larger dataset.

“Video is a training resource with a lot of potential,” says Peter Stone, executive director of Sony AI America, who has previously worked on imitation learning. 

Imitation learning is an alternative to reinforcement learning, in which a neural network learns to perform a task from scratch via trial and error. This is the technique behind many of the biggest AI breakthroughs in the last few years. It has been used to train models that can beat humans at games, control a fusion reactor, and discover a faster way to do fundamental math.
The problem is that reinforcement learning works best for tasks that have a clear goal, where random actions can lead to accidental success. Reinforcement learning algorithms reward those accidental successes to make them more likely to happen again.

But Minecraft is a game with no clear goal. Players are free to do what they like, wandering a computer-generated world, mining different materials and combining them to make different objects.
Minecraft’s open-endedness makes it a good environment for training AI. Baker was one of the researchers behind Hide & Seek, a project in which bots were let loose in a virtual playground where they used reinforcement learning to figure out how to cooperate and use tools to win simple games. But the bots soon outgrew their surroundings. “The agents kind of took over the universe, there was nothing else for them to do” says Baker. “We wanted to expand it and we thought Minecraft was a great domain to work in.”

Link to the rest at MIT Technology Review

PG hopes he is not alienating too many visitors with his occasional forays into artificial intelligence. It’s a topic that he finds fascinating.

As far as relevance to TPV, PG has mentioned AI writing programs, which he expects to become more and more sophisticated over time. While PG will not predict the demise of authors who are human beings, he expects AI to continue to improve and expand its writing capabilities.

Who knows, perhaps someone will take the vast sea of written wisdom PG has produced and create an AI version of PG. Such an AI would have to possess a high tolerance for randomness, however. Much of the time, there is no recognizable logic happening in PG’s brain, so there might be insufficient scaffolding to support the development of any sort of intelligent program.

2 thoughts on “A bot that watched 70,000 hours of Minecraft could unlock AI’s next big thing”

  1. Take this with a very large grain of salt.

    I’ll give the researchers the benefit of the doubt, that they restricted their “trainers” to a single version of “vanilla” Minecraft, on a single platform, in a single game mode.

    But, when you look at the videos about Minecraft on YouTube, almost every single poster is playing it on a different version, on a different platform, and in different game modes. Not to mention the (literally) thousands of “mods” to the “vanilla” game. Each and every one of those has a different strategy. Even if you tried to filter the videos to match the AI training – very, very few of the video makers identify what they are playing.

  2. Please continue with the AI, its perhaps the biggest thing that is coming next… And will likely pepper our century once it is mass adopted .

Comments are closed.