Form, function, and the giant gulf between drawing a picture and understanding the world

From The Road to AI We Can Trust:

Drawing photorealistic images is a major accomplishment for AI, but is it really a step towards general intelligence? Since DALL-E 2 came out, many people have hinted at that conclusion; when the system was announced, Sam Altman tweeted that “AGI is going to be wild”; for Kevin Roose at The New York Times, such systems constitute clear evidence that “We’re in a golden age of progress in artificial intelligence”. (Earlier this week, Scott Alexander seems to have taken apparent progress in these systems as evidence for progress towards general intelligence; I expressed reservations here.)

In assessing progress towards general intelligence, the critical question should be, how much do systems like Dall-E, Imagen, Midjourney, and Stable Diffusion really understand the world, such that they can reason on and act on that knowledge? When thinking about how they fit into AI, both narrow and broad, here are three questions you could ask:

  1. Can the image synthesis systems generate high quality images?
  2. Can they correlate their linguistic input with the images they produce?
  3. Do they understand the world that underlies the images they represent?

On #1, the answer is a clear yes; only highly trained human artists could do better.

On #2, the answer is mixed. They do well on some inputs (like astronaut rides horse) but more poorly on others (like horse rides astronaut, which I discussed in an earlier post). (Below I will show some more examples of failure; there are many examples on the internet of impressive success, as well.)

Crucially, DALL-E and co’s potential contribution to general intelligence (“AGI”) ultimately rests on #3; if all the systems can do is in a hit-or-miss yet spectacular way convert many sentences into text, they may revolutionize the practice of art, but still not really speak to general intelligence, or even represent progress towards general intelligence.

Until this morning, I despaired of assessing what these systems understand about the world at all.

The single clearest hint that they might have trouble that I had seen thus far was from the graphic designer Irina Blok:

As my 8 year old said, reading this draft, “how does the coffee not fall out of the cup?”

The trouble, though, with asking a system like Imagen to draw impossible things is that there is no fact of the matter about what the picture should look like, so the discussion about results cycles endlessly. Maybe the system just “wanted” to draw a surrealistic image. And for that matter, maybe a person would do the same, as Michael Bronstein pointed out.

Link to the rest at The Road to AI We Can Trust

4 thoughts on “Form, function, and the giant gulf between drawing a picture and understanding the world”

  1. I’ve probably said this before in a comment on PV but it bears repetition: the term “artificial intelligence” needs to be got rid of. “Intelligence” here has no relation to what a layman understands by the term and just causes unnecessary confusion. We should probably be talking about [something or other] algorithms, where “something or other” conveys the idea of a computer program created (typically but not always) by applying machine learning and pattern recognition to large datasets.

    Of course there is no chance that the term AI will be dropped, though a variety of systems that were, or currently are, part of the AI ecosystem will probably just become part of modern tech life with no-one any longer thinking they are (colloquially) intelligent.

    And the writer’s answer of yes to his question 1 is dubious in the absence of any definition of “high quality” or limitation on “images”. I could call my DSLR an “image synthesis system” and claim the results are high quality (for some value of high quality) but even though it probably has enough processing power to helm an Apollo moon mission, intelligent it is not.

    • The most accurate terms would be the original: Expert System or, as in the MASS EFFECT games, Virtual Intelligence. Even Inference Engine.

      But the marketing guys latched onto AI because hypesters gotta hype.

    • Good point, again, Mike.

      As Felix points out, The Marketing Department is quick to seize on any promising new technology and improve it.

      The Legal Department just adds 20-30 more provisions (not counting sub-provisions) to the Terms of Use.

    • In the Fifties and early Sixties, they were called “Electronic Brains” and “Thinking Machines.”

      IBM went so far as to use “THINK” as one of their slogan/logos.

      Hallmark birthday cards have more power than those machines.

Comments are closed.