Jodie Archer had always been puzzled by the success of The Da Vinci Code. She’d worked for Penguin UK in the mid-2000s, when Dan Brown’s thriller had become a massive hit, and knew there was no way marketing alone would have led to 80 million copies sold. So what was it, then? Something magical about the words that Brown had strung together? Dumb luck? The questions stuck with her even after she left Penguin in 2007 to get a PhD in English at Stanford. There she met Matthew L. Jockers, a cofounder of the Stanford Literary Lab, whose work in text analysis had convinced him that computers could peer into books in a way that people never could.
Soon the two of them went to work on the “bestseller” problem: How could you know which books would be blockbusters and which would flop, and why? Over four years, Archer and Jockers fed 5,000 fiction titles published over the last 30 years into computers and trained them to “read”—to determine where sentences begin and end, to identify parts of speech, to map out plots. They then used so-called machine classification algorithms to isolate the features most common in bestsellers.
The result of their work—detailed in The Bestseller Code, out this month—is an algorithm built to predict, with 80 percent accuracy, which novels will become mega-bestsellers. What does it like? Young, strong heroines who are also misfits (the type found in *The Girl on the Train, Gone Girl, *and The Girl with the Dragon Tattoo). No sex, just “human closeness.” Frequent use of the verb “need.” Lots of contractions. Not a lot of exclamation marks. Dogs, yes; cats, meh. In all, the “bestseller-ometer” has identified 2,799 features strongly associated with bestsellers.
What Archer and Jockers have done is just one part of a larger movement in the publishing industry to replace gut instinct and wishful thinking with data. A handful of startups in the US and abroad claim to have created their own algorithms or other data-driven approaches that can help them pick novels and nonfiction topics that readers will love, as well as understand which books work for which audiences. Meanwhile, traditional publishers are doing their own experiments: Simon & Schuster hired its first data scientist last year; in May, Macmillan Publishers acquired the digital book publishing platform Pronoun, in part for its data and analytics capabilities.
While these efforts could bring more profit to an oft-struggling industry, the effect for readers is unclear.
“Part of the beautiful thing about books, unlike refrigerators or something, is that sometimes you pick up a book that you don’t know,” says Katherine Flynn, a partner at Boston-based literary agency Kneerim & Williams. “You get exposed to things you wouldn’t have necessarily thought you liked. You thought you liked tennis, but you can read a book about basketball. It’s sad to think that data could narrow our tastes and possibilities.”
They Know What You Did Last Night
Once, publishers had to rely on unit sales to figure out what readers wanted. Digital reading changed that. Publishers can know that you raced through a novel to the end, or that you abandoned it after 20 pages. They can know where and when you’re reading. On some reading sites and apps, users sign in with their Facebook accounts, opening up more personal data. There’s a wrinkle, though: Companies such as Amazon and Apple have the data for books read on their devices, and they aren’t sharing it with publishers.
The ability to know who reads what and how fast is also driving Berlin-based startup Inkitt. Founded by Ali Albazaz, who started coding at age 10, the English-language website invites writers to post their novels for all to see. Inkitt’s algorithms examine reading patterns and engagement levels. For the best performers, Inkitt offers to act as literary agent, pitching the works to traditional publishers and keeping the standard 15 percent commission if a deal results. The site went public in January 2015 and now has 80,000 stories and more than half a million readers around the world.
Albazaz, now 26, sees himself as democratizing the publishing world. “We never, ever, ever judge the books. That’s not our job. We check that the formatting is correct, the grammar is in place, we make sure that the cover is not pixelated,” he says. “Who are we to judge if the plot is good? That’s the job of the market. That’s the job of the readers.”
. . . .
The Data Scare
As Archer and Jocker shopped the *Bestseller Code *manuscript to acquisitions editors, word of their powerful algorithm spread—as did worry and suspicion among those in the publishing profession. “The fear is we can homogenize the market or try and somehow take their jobs away from them, and the answer is no and no,” says Archer. “What the bestseller-ometer is trying to do is say, ‘Hey, pick this new author that you might not dare take a risk on with your acquisitions budget. Their chance is really good.’” Archer, now a writer in Boulder, Colorado, insists that she and Jockers, now an English professor at the University of Nebraska-Lincoln, are “literature-friendly” and want good books to succeed.
Andrew Weber, the global chief operating officer for Macmillan Publishers—whose St. Martin’s Press is publishing *The Bestseller Code—thinks algorithms should be viewed as an additional piece of information, rather than as an excuse to fire the editors. “Whether it’s in acquisition, whether it’s in pricing, whether it’s in marketing, whether it’s in distribution, there just seem to be many, many, many opportunities to improve the quality of our decision-making—and therefore hopefully our results—*by bringing data into the equation,” says Weber. “I would say we are still in the early days of that journey, but that’s the direction we’re headed.”
Archer and Jockers watched eagerly to see which novel would be their algorithm’s favorite. It turned out to be The Circle, a 2013 technothriller by Dave Eggers about working for a massively powerful Internet company. The Circle spent multiple weeks on both The New York Times hardcover fiction and paperback trade fiction bestseller lists. A movie version starring Emma Watson and Tom Hanks is expected in theaters this year.
Link to the rest at Wired
It appears that PG missed this when it first appeared in 2016.
He suspects the almost-universal phobia towards computers, algorithms, quantitative analysis, sophisticated metrics, etc., among the indwellers of traditional publishing is related to the widespread incidence of innumeracy among English majors.
Worship of The Golden Gut is the state religion of this group. For them, no collection of numbers and formulae can ever replace The Hunch. That’s one reason why so many books fail to earn out their advances, how many mega-sellers are first rejected by every major publisher before stumbling into the market and finding success.
Indie authors include a much wider slice of humanity than either publishers or traditionally-published authors. That diversity of talent and background combined with Amazon’s relentless pursuit of customers and, thus, numbers, analytics, categories, sub-categories and sub-sub categories fosters the creation of niches within niches all the way down to the micro-reader level.
PG just checked a random book on the Zon and discovered that it encouraged drill-down and discovery as follows:
* Mystery, Thriller & Suspense
*Thrillers & Suspense
With broad categories mentioned:
Book Fiction Moods
Book Mystery Characters
(PG is not certain how much of this collection of information is presented as result of PG’s and Mrs. PG’s past buying habits.)
Finally, if you prefer, you could check out 383 different categories, series, spinoffs, heroes/heroines, etc., etc., etc., (including, 盗墓笔记, El cementerio de los libros, Svartåsen and Die Krimi-Serie in den Zwanzigern as follows: