From The Road to AI We Can Trust:
Drawing photorealistic images is a major accomplishment for AI, but is it really a step towards general intelligence? Since DALL-E 2 came out, many people have hinted at that conclusion; when the system was announced, Sam Altman tweeted that “AGI is going to be wild”; for Kevin Roose at The New York Times, such systems constitute clear evidence that “We’re in a golden age of progress in artificial intelligence”. (Earlier this week, Scott Alexander seems to have taken apparent progress in these systems as evidence for progress towards general intelligence; I expressed reservations here.)
In assessing progress towards general intelligence, the critical question should be, how much do systems like Dall-E, Imagen, Midjourney, and Stable Diffusion really understand the world, such that they can reason on and act on that knowledge? When thinking about how they fit into AI, both narrow and broad, here are three questions you could ask:
- Can the image synthesis systems generate high quality images?
- Can they correlate their linguistic input with the images they produce?
- Do they understand the world that underlies the images they represent?
On #1, the answer is a clear yes; only highly trained human artists could do better.
On #2, the answer is mixed. They do well on some inputs (like astronaut rides horse) but more poorly on others (like horse rides astronaut, which I discussed in an earlier post). (Below I will show some more examples of failure; there are many examples on the internet of impressive success, as well.)
Crucially, DALL-E and co’s potential contribution to general intelligence (“AGI”) ultimately rests on #3; if all the systems can do is in a hit-or-miss yet spectacular way convert many sentences into text, they may revolutionize the practice of art, but still not really speak to general intelligence, or even represent progress towards general intelligence.
Until this morning, I despaired of assessing what these systems understand about the world at all.
The single clearest hint that they might have trouble that I had seen thus far was from the graphic designer Irina Blok:
As my 8 year old said, reading this draft, “how does the coffee not fall out of the cup?”
The trouble, though, with asking a system like Imagen to draw impossible things is that there is no fact of the matter about what the picture should look like, so the discussion about results cycles endlessly. Maybe the system just “wanted” to draw a surrealistic image. And for that matter, maybe a person would do the same, as Michael Bronstein pointed out.
Link to the rest at The Road to AI We Can Trust