The Alignment Problem

This content has been archived. It may no longer be accurate or relevant.

From The Wall Street Journal:

In the mid-1990s, a group of software developers applied the latest computer learning to tackle a problem that emergency-room doctors were routinely facing: which of the patients who showed up with pneumonia should be admitted and which could be sent home to recover there? An algorithm analyzed more than 15,000 patients and came up with a series of predictions intended to optimize patient survival. There was, however, an oddity—the computer concluded that asthmatics with pneumonia were low-risk and could be treated as outpatients. The programmers were skeptical.

Their doubts proved correct. As clinicians later explained, when asthmatics show up to an emergency room with pneumonia, they are considered so high-risk that they tend to be triaged immediately to more intensive care. It was this policy that accounted for their lower-than-expected mortality, the outcome that the computer was trying to optimize. The algorithm, in other words, provided the wrong recommendation, but it was doing exactly what it had been programmed to do.

The disconnect between intention and results—between what mathematician Norbert Wiener described as “the purpose put into the machine” and “the purpose we really desire”—defines the essence of “the alignment problem.” Brian Christian, an accomplished technology writer, offers a nuanced and captivating exploration of this white-hot topic, giving us along the way a survey of the state of machine learning and of the challenges it faces.

The alignment problem, Mr. Christian notes, is as old as the earliest attempts to persuade machines to reason, but recent advances in data-capture and computational power have given it a new prominence. To show the limits of even the most sophisticated algorithms, he describes what happened when a vast database of human language was harvested from published books and the internet. It enabled the mathematical analysis of language—facilitating dramatically improved word translations and creating opportunities to express linguistic relationships as simple arithmetical expressions. Type in “King-Man+Woman” and you got “Queen.” But if you tried “Doctor-Man+Woman,” out popped “Nurse.” “Shopkeeper-Man+Woman” produced “Housewife.” Here the math reflected, and risked perpetuating, historical sexism in language use. Another misalignment example: When an algorithm was trained on a data set of millions of labeled images, it was able to sort photos into categories as fine-grained as “Graduation”—yet classified people of color as “Gorillas.” This problem was rooted in deficiencies in the data set on which the model was trained. In both cases, the programmers had failed to recognize, much less seriously consider, the shortcomings of their models.

We are attracted, Mr. Christian observes, to the idea “that society can be made more consistent, more accurate, and more fair by replacing idiosyncratic human judgment with numerical models.” But we may be expecting too much of our software. A computer program intended to guide parole decisions, for example, delivered guidance that distilled and arguably propagated underlying racial inequalities. Is this the algorithm’s fault, or ours?

To answer this question and others, Mr. Christian devotes much of “The Alignment Problem” to the challenges of teaching computers to do what we want them to do. A computer seeking to maximize its score through trial and error, for example, can quickly figure out shoot-’em-up videogames like “Space Invaders” but struggles with Indiana Jones-style adventure games like “Montezuma’s Revenge,” where rewards are sparse and you need to swing across a pit and climb a ladder before you start to score. Human gamers are instinctively driven to explore and figure out what’s behind the next door, but the computer wasn’t—until a “curiosity” incentive was provided.

Link to the rest at The Wall Street Journal (PG apologizes for the paywall, but hasn’t figured out a way around it.)

When PG was in high school, The Mother of PG aka Mom, made PG take a typing class. Learning how to type and type quickly might have been the most useful thing PG learned in high school.

PG earned money in college by typing papers for other students who couldn’t type. He charged a high per-page rate and collected it because he specialized in typing for procrastinators. If you finished your rough draft at midnight, PG would show up with his portable typewriter and turn it into something your professor would accept at 8:00 am the next morning.

PG kept typing through law school, typing all his law school exams and whatever parts of the bar exam that could be typed.

When PG was a baby lawyer, he had a client who was also working with a fancy law firm in Los Angeles. He went over to the fancy law firm on occasion to meet with the fancy lawyers who worked there (He rode up the elevator to the law firm’s offices with Marlon Brando one time and Kareem Abdul-Jabbar another time. Kareem looked a lot less dissipated than Marlon.)

The fancy law firm had the first word-processing computers PG had ever seen. The firm had eight of these computers and they were operated by the fastest and most-accurate typists PG has ever seen. The machines and operators were in their own glass-walled room and at least a couple of typists were on duty 24 hours a day. (PG was there at midnight to pick up a rush project and one of them delivered a finished contract to him at midnight.) PG just checked and each of the computerized word processors cost over $180,000 in 2020 dollars.

PG was the first lawyer he knew who bought a personal computer for his law office. Fortunately, personal computers could also be used for playing videogames, so the price had come way, way, way, way down from $180,000.

Because he could still type fast, PG learned how a word processing program worked. Plus a bunch of other programs. He quickly started using his PC for legal work. Why type a document you used for a lot of different clients over and over when you could just type it once for Client A, save a copy, then use the copy as the basis for Clients B-Z?

PC’s were evolving quickly, so when a more powerful PC was released, PG bought one and moved his prior PC to his secretary’s desk and showed her how to use the word processing program.

Since PG always hired the smartest secretaries he could find, within a couple of weeks, she was better with the word processor than PG was.

For a variety of different reasons, PG started doing a lot of divorces for people who didn’t have a lot of money (the local Legal Aid office thought he did a good job and sent a lot of clients his way).

In order to make money doing divorces for people who didn’t have much (Legal Aid never had enough money, so it didn’t pay much for a divorce either), PG built a computer program so he could do the paperwork necessary for a divorce very quickly.

The wife’s name, the husband’s name, the kids names and ages, the year and make of the rusted-out pickup, the TV, sofa, etc., were the same from start to finish, so why not type them into a computer program once, then build standard legal forms that would use the same information for all the various forms the state legislature, in its infinite wisdom, had said were necessary to end a marriage?

PG has meandered for too long, but to conclude quickly, he ended up building a commercial divorce computer program he named “Splitsville” and sold it to about 20% of the attorneys in the state where he was practicing at the time.

(In the United States, the laws governing divorce AKA Dissolution of Marriage vary from state-to-state, so Splitsville couldn’t cross state lines. Even though the fundamental human and property issues are the same any time a marriage is ended, PG suspects there are enough idiots in any state legislature to shout down anyone who says, “Why don’t we just do it the way Alabama does instead of concocting a divorce law of our own?”)

Which means PG doesn’t have enough knowledge to build artificial intelligence programs as described in the OP, but he does have an intuitive grasp of how to persuade computers do things you would like them to accomplish. PG and computers seem to understand each other at a visceral level even though PG is less like a computer than a whole lot of smart people he knows. It’s sort of a Yin/Yang thing.

His liberal-arts assessment of the problem described in the OP is that the computer scientists in the OP haven’t figured out how to ask the ultra-computer for the answers they would like it to provide. A computer can do smart things and dumb things very quickly, but useful output requires understanding what you really want it to do, then figuring out how to explain the job to the computer.

But, undoubtedly, PG is missing something entirely and is totally off-base.

The Alignment Problem may be a good description of both the computer issue described in the book and of PG himself.

14 thoughts on “The Alignment Problem”

  1. Never in my years as a professional developer have I heard “alignment problem” (except at my car mechanic).

    One thing I did learn is that “GIGO” is all too frequently an excuse for a lazy developer. The first example (admissions) is GIGO. The second one (languages) is a failure to consider the characteristics of a data set that was correct, and a program that was correct – but produced unwanted results. The third (photo recognition) is a programming failure – a simplistic analysis that latched on to the first and easiest differential, skin color.

    • Agree.

      From the first time typed a program on punch cards, the overwhelming fact about programming for me was that the machine did exactly what I told it to do. Exactly. Not what I meant, not what I wished, but the commands I punched into the cards. I found that fascinating. It told me more about what I was doing than any prior experience. Shaping a program is an exercise in determining exactly what you want to accomplish.

      GIGO? Input is exactly what it is. Garbage? Not garbage? Meaningless judgements. The program controls the transformation of the input, not the input. The right transformation spins garbage into gold. Garbage In, Garbage Out is a programming failure. Garbage In, Gold Out, a program that takes difficult input and turns it into useful information, that’s a great program.

      • “GIGO? Input is exactly what it is. Garbage? Not garbage? Meaningless judgements. The program controls the transformation of the input, not the input. ”

        Not quite.
        Data validation and bounds checking is part of good and proper programming. GIGO is about poor programming and improper usage. Both are part of the programmer’s job.

        Software can’t be developed in a vacuum, just black boxing algorithms without concern for how it will be used or what data it will be fed. No code can ever be 100% foolproof but human factors are part of the job, be it user interface, documentation, or data validation.

        • You make my point. I don’t know how many times semi-competent programmers have tried to justify their sloppy or non-existent data validation with GIGO. “What can I do? It’s GIGO,” they whine. Nonsense. A lousy program blithely produces bad results from bad input. A better program rejects improper input. A superb program responds to and processes input to turn it into useful information.

    • Some valid points about GIGO, but programs aren’t alchemy. If I feed a program bogus information, I can’t get gold out. (Well, it’s true that sometimes garbage can give hackers gold out, like the online bookstore that allowed users to order negative books. Along related lines, proper domain modelling can reduce the risks of garbage in).

      To give an example from machine vision, I can use algorithms to bring out features that aren’t very obvious in an image. But the feature has to exist in the image – not even the best AI algorithm can get any information from an all black image! So the first rule of machine vision is to pay attention to the lighting, NOT the software, since I can’t find gold in garbage images.

      • not even the best AI algorithm can get any information from an all black image!

        I recall a 1990s visit to the NY Met Museum Of Art. A prominent display in the main lobby was a canvas about 10×10 feet that was solid black.

  2. a book on making effective graphs had a couple really good examples of woefully misleading graphs

    1. a graph of the launch temps including the proposed Challenger launch (they put a tall rocket at each point, and so the rockets overlapped a lot and it didn’t look like the launch temp was that much lower than prior launches, change the same graph to just points and it leaps out at you)

    2. a study done during WWII to figure out where armor needed to be placed on fighters, they put a dot showing every bullet hole reported on the model, and it showed a white area around the cockpit. This seems to imply that that area needs less armor. But it actually needs more, because hits in that area caused the fighters to not return.

    or to use a couple old proverbs

    figures don’t lie but liers figure
    garbage-in garbage-out
    lies, d*** lies, and statistics
    In theory, theory and practice are the same, in practice they are different
    the map is not the territory

  3. A computer can do smart things and dumb things very quickly, but useful output requires understanding what you really want it to do, then figuring out how to explain the job to the computer.

    In many applications, the point is to get what you want when you don’t know how to do it.

    Understanding what you want is the easy part. Today’s neural nets figure out how to do it by repetitive training. The users don’t know how the thing is making the connections.

    Zillions of days pricing are fed into nets for securities and futures trading, and the users don’t know or care how the thing does what it does. The programmers make the neural net program, but the users decide what it will be trained on. Maybe a trader feeds an 18-trade stochastic and a 9/21-trade exponential moving average. Another trader might feed daily point and figure data for December corn into the exact same programming product, and add a study on July/Dec spreads.

    The product of the programmer can be applied to lots of different trading situations. The programmer has no idea how the user will use the thing. If ten traders use the same product for ten different trading scenarios, the result is ten nets that do ten very different things in very different ways. But, it’s the same program. The programmer may know next to nothing about markets. He builds the hammer. Someone else swings it.

    My favorite story is when the US Army decided to use a net to identify forces on the ground. They wanted to identify Soviet tanks vs NATO tanks. So they fed a zillion aerial pics of tanks into the net, each labeled as Soviet or NATO. Testing on another data set showed it worked very well.

    But, it failed miserably in practice. All the training pics for Soviet tanks were taken on cloudy days. The NATO pics were taken on sunny days. Therefore, the program decided NATO tanks had shadows, and Soviet tanks did not.

  4. Nothing new here nor is it limited to computer programing.
    Rather it is intrinsic to all human activities, especially when those in power seek to compel specific outcomes. Because nobody ever knows everything relevant and humans aren’t predictable. More often than not, when forced to act in ways they disagree with, they find ways of avoiding the undersired outcome.

    In economics and engineering (which relies on economics as much as physics, chemistry, etc) the guiding Principle is known as the Law of Unintended Consequences.

    —-
    The law of unintended consequences is that actions of people—and especially of government—always have effects that are unanticipated or unintended. Economists and other social scientists have heeded its power for centuries; for just as long, politicians and popular opinion have largely ignored it.

    The concept of unintended consequences is one of the building blocks of economics. Adam Smith’s “invisible hand,” the most famous metaphor in social science, is an example of a positive unintended consequence. Smith maintained that each individual, seeking only his own gain, “is led by an invisible hand to promote an end which was no part of his intention,” that end being the public interest. “It is not from the benevolence of the butcher, or the baker, that we expect our dinner,” Smith wrote, “but from regard to their own self interest.”

    Most often, however, the law of unintended consequences illuminates the perverse unanticipated effects of legislation and regulation. In 1692 the English philosopher John Locke, a forerunner of modern economists, urged the defeat of a parliamentary bill designed to cut the maximum permissible rate of interest from 6 percent to 4 percent. Locke argued that instead of benefiting borrowers, as intended, it would hurt them. People would find ways to circumvent the law, with the costs of circumvention borne by borrowers. To the extent the law was obeyed, Locke concluded, the chief results would be less available credit and a redistribution of income away from “widows, orphans and all those who have their estates in money.”

    In the first half of the nineteenth century, the famous French economic journalist Frédéric Bastiat often distinguished in his writing between the “seen” and the “unseen.” The seen were the obvious visible consequences of an action or policy. The unseen were the less obvious, and often unintended, consequences. In his famous essay “What Is Seen and What Is Not Seen,” Bastiat wrote:

    “There is only one difference between a bad economist and a good one: the bad economist confines himself to the visible effect; the good economist takes into account both the effect that can be seen and those effects that must be foreseen. ”

    Bastiat applied his analysis to a wide range of issues, including trade barriers, taxes, and government spending.[2]

    In classic economics, the Law of Unintended Consequences is why the “Tragedy of the Commons” exists all over.

    In contemporary politics the “unseen” is usually an emergent effect of large numbers of humans acting on their individual interests instead of the wishful thinking of activists and politicians. It results in counterproductive outcomes of (generally) well-intentioned but ill-considered exercises in social engineering. Such as the hollowing out of big cities’ tax base through “white flight” caused by cross-city busing of children.

    In the cited example, the programmers failed to consider that the humans (doctors) were not working blindly without considering the inputs and outcomes. Most doctors are well trained in Triage and don’t need software to replace their judgment and experience. The software only looked at outcomes, not how they happened. It is tgevmost comon failibg of politicians who never consider “how do we get there from here.”

    Programmers by and large are *not* unaware of this “alignment problem”. They just refer to it as GIGO. Garbage in, Garbage out.

  5. useful output requires understanding what you really want […] to do

    People have a hard time with this part in general.

  6. 42.

    In other words, maybe you could read The Hitchhiker’s Guide to the Galaxy, and learn about the limitations of computers in a fun way.

    Or, just remember the old phrase (which maybe Mr. Christian never learned) GIGO (Garbage In, Garbage Out).

Comments are closed.