. . . .
By 2002, it seemed to Page like the time might be ripe to come back to books. With that 40-minute number in mind, he approached the University of Michigan, his alma mater and a world leader in book scanning, to find out what the state of the art in mass digitization looked like. Michigan told Page that at the current pace, digitizing their entire collection—7 million volumes—was going to take about a thousand years. Page, who’d by now given the problem some thought, replied that he thought Google could do it in six.. . . .He offered the library a deal: You let us borrow all your books, he said, and we’ll scan them for you. You’ll end up with a digital copy of every volume in your collection, and Google will end up with access to one of the great untapped troves of data left in the world. Brin put Google’s lust for library books this way: “You have thousands of years of human knowledge, and probably the highest-quality knowledge is captured in books.” What if you could feed all the knowledge that’s locked up on paper to a search engine?
By 2004, Google had started scanning. In just over a decade, after making deals with Michigan, Harvard, Stanford, Oxford, the New York Public Library, and dozens of other library systems, the company, outpacing Page’s prediction, had scanned about 25 million books. It cost them an estimated $400 million. It was a feat not just of technology but of logistics.
. . . .
The stations—which didn’t so much scan as photograph books—had been custom-built by Google from the sheet metal up. Each one could digitize books at a rate of 1,000 pages per hour. The book would lie in a specially designed motorized cradle that would adjust to the spine, locking it in place. Above, there was an array of lights and at least $1,000 worth of optics, including four cameras, two pointed at each half of the book, and a range-finding LIDAR that overlaid a three-dimensional laser grid on the book’s surface to capture the curvature of the paper. The human operator would turn pages by hand—no machine could be as quick and gentle—and fire the cameras by pressing a foot pedal, as though playing at a strange piano.
What made the system so efficient is that it left so much of the work to software. Rather than make sure that each page was aligned perfectly, and flattened, before taking a photo, which was a major source of delays in traditional book-scanning systems, cruder images of curved pages were fed to de-warping algorithms, which used the LIDAR data along with some clever mathematics to artificially bend the text back into straight lines.
. . . .
In August 2010, Google put out a blog post announcing that there were 129,864,880 books in the world. The company said they were going to scan them all.
Of course, it didn’t quite turn out that way. This particular moonshot fell about a hundred-million books short of the moon. What happened was complicated but how it started was simple: Google did that thing where you ask for forgiveness rather than permission, and forgiveness was not forthcoming. Upon hearing that Google was taking millions of books out of libraries, scanning them, and returning them as if nothing had happened, authors and publishers filed suit against the company, alleging, as the authors put it simply in their initial complaint, “massive copyright infringement.”
. . . .
As Tim Wu pointed out in a 2003 law review article, what usually becomes of these battles—what happened with piano rolls, with records, with radio, and with cable—isn’t that copyright holders squash the new technology. Instead, they cut a deal and start making money from it. Often this takes the form of a “compulsory license” in which, for example, musicians are required to license their work to the piano-roll maker, but in exchange, the piano-roll maker has to pay a fixed fee, say two cents per song, for every roll they produce. Musicians get a new stream of income, and the public gets to hear their favorite songs on the player piano. “History has shown that time and market forces often provide equilibrium in balancing interests,” Wu writes.
But even if everyone typically ends up ahead, each new cycle starts with rightsholders fearful they’re being displaced by the new technology. When the VCR came out, film executives lashed out. “I say to you that the VCR is to the American film producer and the American public as the Boston strangler is to the woman home alone,” Jack Valenti, then the president of the MPAA, testified before Congress. The major studios sued Sony, arguing that with the VCR, the company was trying to build an entire business on intellectual property theft. But Sony Corp. of America v. Universal City Studios, Inc. became famous for its holding that as long as a copying device was capable of “substantial noninfringing uses”—like someone watching home movies—its makers couldn’t be held liable for copyright infringement.
The Sony case forced the movie industry to accept the existence of VCRs. Not long after, they began to see the device as an opportunity. “The VCR turned out to be one of the most lucrative inventions—for movie producers as well as hardware manufacturers—since movie projectors,” one commentator put it in 2000.
It only took a couple of years for the authors and publishers who sued Google to realize that there was enough middle ground to make everyone happy. This was especially true when you focused on the back catalog, on out-of-print works, instead of books still on store shelves. Once you made that distinction, it was possible to see the whole project in a different light. Maybe Google wasn’t plundering anyone’s work. Maybe they were giving it a new life. Google Books could turn out to be for out-of-print books what the VCR had been for movies out of the theater.If that was true, you wouldn’t actually want to stop Google from scanning out-of-print books—you’d want to encourage it. In fact, you’d want them to go beyond just showing snippets to actually selling those books as digital downloads.. . . .
Those who had been at the table crafting the agreement had expected some resistance, but not the “parade of horribles,” as Sarnoff described it, that they eventually saw. The objections came in many flavors, but they all started with the sense that the settlement was handing to Google, and Google alone, an awesome power. “Did we want the greatest library that would ever exist to be in the hands of one giant corporation, which could really charge almost anything it wanted for access to it?”, Robert Darnton, then president of Harvard’s library, has said.
Darnton had initially been supportive of Google’s scanning project, but the settlement made him wary. The scenario he and many others feared was that the same thing that had happened to the academic journal market would happen to the Google Books database. The price would be fair at first, but once libraries and universities became dependent on the subscription, the price would rise and rise until it began to rival the usurious rates that journals were charging, where for instance by 2011 a yearly subscription to the Journal of Comparative Neurology could cost as much as $25,910.Although academics and library enthusiasts like Darnton were thrilled by the prospect of opening up out-of-print books, they saw the settlement as a kind of deal with the devil. Yes, it would create the greatest library there’s ever been—but at the expense of creating perhaps the largest bookstore, too, run by what they saw as a powerful monopolist. In their view, there had to be a better way to unlock all those books. “Indeed, most elements of the GBS settlement would seem to be in the public interest, except for the fact that the settlement restricts the benefits of the deal to Google,” the Berkeley law professor Pamela Samuelson wrote.