How Google Book Search Got Lost

This content has been archived. It may no longer be accurate or relevant.

From Backchannel:

Books can do anything. As Franz Kafka once said, “A book must be the axe for the frozen sea inside us.”

It was Kafka, wasn’t it? Google confirms this. But where did he say it? Google offers links to some quotation websites, but they’re generally unreliable. (They misattribute everything, usually to Mark Twain.)

To answer such questions, you need Google Book Search, the tool that magically scours the texts of millions of digitized volumes. Just find the little “more” tab at the top of the Google results page — it’s right past Images, Videos, and News. Then click on it, find “Books,” and click on that.

. . . .

It turns out that the “frozen sea” quote is from Kafka’s Letters to Friends, Family, and Editors, in a missive to Oskar Pollak, dated January 27, 1904.

. . . .

Google Book Search is amazing that way. When it started almost 15 years ago, it also seemed impossibly ambitious: An upstart tech company that had just tamed and organized the vast informational jungle of the web would now extend the reach of its search box into the offline world. By scanning millions of printed books from the libraries with which it partnered, it would import the entire body of pre-internet writing into its database.

“You have thousands of years of human knowledge, and probably the highest-quality knowledge is captured in books,” Google cofounder Sergey Brin told The New Yorker at the time. “So not having that — it’s just too big an omission.”

. . . .

Today, Google is known for its moonshot culture, its willingness to take on gigantic challenges at global scale. Books was, by general agreement of veteran Googlers, the company’s first lunar mission. Scan All The Books!

In its youth, Google Books inspired the world with a vision of a “library of utopia” that would extend online convenience to offline wisdom. At the time it seemed like a singularity for the written word: We’d upload all those pages into the ether, and they would somehow produce a phase-shift in human awareness. Instead, Google Books has settled into a quiet middle age of sourcing quotes and serving up snippets of text from the 25 million-plus tomes in its database.

Google employees maintain that’s all they ever intended to achieve. Maybe so. But they sure got everyone else’s hopes up.

. . . .

When I started work on this story, I feared at first that Books no longer existed as a discrete part of the Google organization — that Google had actually shut the project down. As with many aspects of Google, there’s always been some secrecy around Google Books, but this time, when I started asking questions, it closed up like a startled turtle. For weeks there didn’t seem to be anyone around or available who could or would speak to the current state of the Books effort.

The Google Books “History” page trails off in 2007, and its blog stopped updating in 2012, after which it got folded into the main Google Search blog, where information about Books is nearly impossible to find. As a functioning and useful service, Google Books remained a going concern. But as a living project, with plans and announcements and institutional visibility, it seemed to have pulled a vanishing act. All of which felt weird, given the legal victory it had finally won.

When I talked to alumni of the project who’d left Google, several mentioned that they suspected the company had stopped scanning books. Eventually, I learned that there are, indeed, still some Googlers working on Book Search, and they’re still adding new books, though at a significantly slower pacethan at the project’s peak around 2010–11.

. . . .

LED lighting, not widely available at the project’s start, has helped. So has studying more efficient techniques for human operators to flip pages. “It’s almost like finger-picking on a guitar,” Jaskiewicz says. “So we find people who have great ways of turning pages — where is the thumb and that kind of stuff.”

. . . .

Like many tech-friendly bibliophiles, Sloan says he uses Google Books a lot, but is sad that it isn’t continuing to evolve and amaze us. “I wish it was a big glittering beautiful useful thing that was growing and getting more interesting all the time,” he says. He also wonders: We know Google can’t legally make its millions of books available for anyone to read in full — but what if it made them available for machines to read?

Machine-learning tools that analyze texts in new ways are advancing quickly today, Sloan notes, and “the culture around it has a real Homebrew Computer Club or early web feel to it right now.” But to progress, researchers need big troves of data to feed their programs.

“If Google could find a way to take that corpus, sliced and diced by genre, topic, time period, all the ways you can divide it, and make that available to machine-learning researchers and hobbyists at universities and out in the wild, I’ll bet there’s some really interesting work that could come out of that. Nobody knows what,” Sloan says. He assumes Google is already doing this internally. Jaskiewicz and others at Google would not say.

Link to the rest at Backchannel

7 thoughts on “How Google Book Search Got Lost”

    • Me too. It’s wonderful for finding that quote that teases at your memory, but you can’t remember the source. And for doing things like finding out whether British writers of fiction set in Tudor times write Mr. or Master. More focused and faster than going to the library and turning lots of pages :-).

      But Google also burned a lot of libraries (figuratively!), by means of great arrogance, peremptory demands, and not giving back the scanned files as promised for the libraries to incorporate into their digital repositories. So it’s possible that their slowdown is partly related to more reluctance among potential partners.

  1. As the judge pointed out, the AG had no right to speak for authors in general.

    Something that shouldn’t be forgotten…

  2. As with many types of fight, losing your forward momentum can be a killer, or in this case kill a project. (Which was what the other side was aiming for.)

  3. They could’ve avoided alot of grief if they had simply ignored the AG instead of taking them at their word and trying to settle with them. As the judge pointed out, the AG had no right to speak for authors in general.

  4. I suspect that the ongoing lawsuits against Google have had something to do with their apparent lack of progress.

    It’s also probably slowed down the scanning by making it harder to find sources of new books to scan (if there’s outrage every time a library agrees to let Google scan their books, it discourages other libraries from doing so)

Comments are closed.