Torching the Modern-Day Library of Alexandria

This content has been archived. It may no longer be accurate or relevant.

From The Atlantic:

You were going to get one-click access to the full text of nearly every book that’s ever been published. Books still in print you’d have to pay for, but everything else—a collection slated to grow larger than the holdings at the Library of Congress, Harvard, the University of Michigan, at any of the great national libraries of Europe—would have been available for free at terminals that were going to be placed in every local library that wanted one.

At the terminal you were going to be able to search tens of millions of books and read every page of any book you found. You’d be able to highlight passages and make annotations and share them; for the first time, you’d be able to pinpoint an idea somewhere inside the vastness of the printed record, and send somebody straight to it with a link. Books would become as instantly available, searchable, copy-pasteable—as alive in the digital world—as web pages.

It was to be the realization of a long-held dream. “The universal library has been talked about for millennia,” Richard Ovenden, the head of Oxford’s Bodleian Libraries, has said. “It was possible to think in the Renaissance that you might be able to amass the whole of published knowledge in a single room or a single institution.” In the spring of 2011, it seemed we’d amassed it in a terminal small enough to fit on a desk.

“This is a watershed event and can serve as a catalyst for the reinvention of education, research, and intellectual life,” one eager observer wrote at the time.On March 22 of that year, however, the legal agreement that would have unlocked a century’s worth of books and peppered the country with access terminals to a universal library was rejected under Rule 23(e)(2) of the Federal Rules of Civil Procedure by the U.S. District Court for the Southern District of New York.When the library at Alexandria burned it was said to be an “international catastrophe.” When the most significant humanities project of our time was dismantled in court, the scholars, archivists, and librarians who’d had a hand in its undoing breathed a sigh of relief, for they believed, at the time, that they had narrowly averted disaster.

. . . .

Google’s secret effort to scan every book in the world, codenamed “Project Ocean,” began in earnest in 2002 when Larry Page and Marissa Mayer sat down in the office together with a 300-page book and a metronome. Page wanted to know how long it would take to scan more than a hundred-million books, so he started with one that was lying around. Using the metronome to keep a steady pace, he and Mayer paged through the book cover-to-cover. It took them 40 minutes.

Page had always wanted to digitize books. Way back in 1996, the student project that eventually became Google—a “crawler” that would ingest documents and rank them for relevance against a user’s query—was actually conceived as part of an effort “to develop the enabling technologies for a single, integrated and universal digital library.” The idea was that in the future, once all books were digitized, you’d be able to map the citations among them, see which books got cited the most, and use that data to give better search results to library patrons. But books still lived mostly on paper. Page and his research partner, Sergey Brin, developed their popularity-contest-by-citation idea using pages from the World Wide Web.
By 2002, it seemed to Page like the time might be ripe to come back to books. With that 40-minute number in mind, he approached the University of Michigan, his alma mater and a world leader in book scanning, to find out what the state of the art in mass digitization looked like. Michigan told Page that at the current pace, digitizing their entire collection—7 million volumes—was going to take about a thousand years. Page, who’d by now given the problem some thought, replied that he thought Google could do it in six.. . . .He offered the library a deal: You let us borrow all your books, he said, and we’ll scan them for you. You’ll end up with a digital copy of every volume in your collection, and Google will end up with access to one of the great untapped troves of data left in the world. Brin put Google’s lust for library books this way: “You have thousands of years of human knowledge, and probably the highest-quality knowledge is captured in books.” What if you could feed all the knowledge that’s locked up on paper to a search engine?

By 2004, Google had started scanning. In just over a decade, after making deals with Michigan, Harvard, Stanford, Oxford, the New York Public Library, and dozens of other library systems, the company, outpacing Page’s prediction, had scanned about 25 million books. It cost them an estimated $400 million. It was a feat not just of technology but of logistics.

. . . .

The stations—which didn’t so much scan as photograph books—had been custom-built by Google from the sheet metal up. Each one could digitize books at a rate of 1,000 pages per hour. The book would lie in a specially designed motorized cradle that would adjust to the spine, locking it in place. Above, there was an array of lights and at least $1,000 worth of optics, including four cameras, two pointed at each half of the book, and a range-finding LIDAR that overlaid a three-dimensional laser grid on the book’s surface to capture the curvature of the paper. The human operator would turn pages by hand—no machine could be as quick and gentle—and fire the cameras by pressing a foot pedal, as though playing at a strange piano.

What made the system so efficient is that it left so much of the work to software. Rather than make sure that each page was aligned perfectly, and flattened, before taking a photo, which was a major source of delays in traditional book-scanning systems, cruder images of curved pages were fed to de-warping algorithms, which used the LIDAR data along with some clever mathematics to artificially bend the text back into straight lines.

. . . .

In August 2010, Google put out a blog post announcing that there were 129,864,880 books in the world. The company said they were going to scan them all.

Of course, it didn’t quite turn out that way. This particular moonshot fell about a hundred-million books short of the moon. What happened was complicated but how it started was simple: Google did that thing where you ask for forgiveness rather than permission, and forgiveness was not forthcoming. Upon hearing that Google was taking millions of books out of libraries, scanning them, and returning them as if nothing had happened, authors and publishers filed suit against the company, alleging, as the authors put it simply in their initial complaint, “massive copyright infringement.”

. . . .

As Tim Wu pointed out in a 2003 law review article, what usually becomes of these battles—what happened with piano rolls, with records, with radio, and with cable—isn’t that copyright holders squash the new technology. Instead, they cut a deal and start making money from it. Often this takes the form of a “compulsory license” in which, for example, musicians are required to license their work to the piano-roll maker, but in exchange, the piano-roll maker has to pay a fixed fee, say two cents per song, for every roll they produce. Musicians get a new stream of income, and the public gets to hear their favorite songs on the player piano. “History has shown that time and market forces often provide equilibrium in balancing interests,” Wu writes.

But even if everyone typically ends up ahead, each new cycle starts with rightsholders fearful they’re being displaced by the new technology. When the VCR came out, film executives lashed out. “I say to you that the VCR is to the American film producer and the American public as the Boston strangler is to the woman home alone,” Jack Valenti, then the president of the MPAA, testified before Congress. The major studios sued Sony, arguing that with the VCR, the company was trying to build an entire business on intellectual property theft. But Sony Corp. of America v. Universal City Studios, Inc. became famous for its holding that as long as a copying device was capable of “substantial noninfringing uses”—like someone watching home movies—its makers couldn’t be held liable for copyright infringement.

The Sony case forced the movie industry to accept the existence of VCRs. Not long after, they began to see the device as an opportunity. “The VCR turned out to be one of the most lucrative inventions—for movie producers as well as hardware manufacturers—since movie projectors,” one commentator put it in 2000.
It only took a couple of years for the authors and publishers who sued Google to realize that there was enough middle ground to make everyone happy. This was especially true when you focused on the back catalog, on out-of-print works, instead of books still on store shelves. Once you made that distinction, it was possible to see the whole project in a different light. Maybe Google wasn’t plundering anyone’s work. Maybe they were giving it a new life. Google Books could turn out to be for out-of-print books what the VCR had been for movies out of the theater.If that was true, you wouldn’t actually want to stop Google from scanning out-of-print books—you’d want to encourage it. In fact, you’d want them to go beyond just showing snippets to actually selling those books as digital downloads.. . . .

Those who had been at the table crafting the agreement had expected some resistance, but not the “parade of horribles,” as Sarnoff described it, that they eventually saw. The objections came in many flavors, but they all started with the sense that the settlement was handing to Google, and Google alone, an awesome power. “Did we want the greatest library that would ever exist to be in the hands of one giant corporation, which could really charge almost anything it wanted for access to it?”, Robert Darnton, then president of Harvard’s library, has said.

Darnton had initially been supportive of Google’s scanning project, but the settlement made him wary. The scenario he and many others feared was that the same thing that had happened to the academic journal market would happen to the Google Books database. The price would be fair at first, but once libraries and universities became dependent on the subscription, the price would rise and rise until it began to rival the usurious rates that journals were charging, where for instance by 2011 a yearly subscription to the Journal of Comparative Neurology could cost as much as $25,910.Although academics and library enthusiasts like Darnton were thrilled by the prospect of opening up out-of-print books, they saw the settlement as a kind of deal with the devil. Yes, it would create the greatest library there’s ever been—but at the expense of creating perhaps the largest bookstore, too, run by what they saw as a powerful monopolist. In their view, there had to be a better way to unlock all those books. “Indeed, most elements of the GBS settlement would seem to be in the public interest, except for the fact that the settlement restricts the benefits of the deal to Google,” the Berkeley law professor Pamela Samuelson wrote.

Link to the rest at The Atlantic and thanks to Valerie for the tip.

3 thoughts on “Torching the Modern-Day Library of Alexandria”

  1. the issues are many and are not mentioned in the article. One was that harvard, u of mich, etc cut a money deal. Most librarians at the unis were not on board with taking authors’ works without agreement with rightsholders.

    Eric Schmidt etc had big grabs in the style of carnegie, mellon, rockefeller, take from the little guys, tell them they ought just work harder. There are many many issues to what happened here. It was no “library of alexandria’,what a crock. It was a money mountain and a grab by a huge and powerful corp which brokered secretly to make the grab. Money money money, for all but the creators. It was shameful down to the bones. It could have been otherwise, but again, those pesky authors, let’s just do away with them.

    As it stood, authors were required to opt out of google’s grab, IF they knew about their work being digitized, given away for ad money sidebars, with no share for the author in revenues. One was required to do tons of paperwork if one had many books, mag articles, anthology contribs, etc.

    And some of the authors had the timerity to whine about being treated like 19th century coal miners, and goog thought authors ought not organize, protest, parry for better, when in fact, authors owned the mine, extracted the value from it, placed it for sale for small price to feed themselves. Oh yes, the authors had it all wrong. Squatters’ rights are to reign.

    bs x10.

Comments are closed.