Algorithms Could Save Book Publishing—But Ruin Novels

This content has been archived. It may no longer be accurate or relevant.

From Wired:

Jodie Archer had always been puzzled by the success of The Da Vinci Code. She’d worked for Penguin UK in the mid-2000s, when Dan Brown’s thriller had become a massive hit, and knew there was no way marketing alone would have led to 80 million copies sold. So what was it, then? Something magical about the words that Brown had strung together? Dumb luck? The questions stuck with her even after she left Penguin in 2007 to get a PhD in English at Stanford. There she met Matthew L. Jockers, a cofounder of the Stanford Literary Lab, whose work in text analysis had convinced him that computers could peer into books in a way that people never could.

Soon the two of them went to work on the “bestseller” problem: How could you know which books would be blockbusters and which would flop, and why? Over four years, Archer and Jockers fed 5,000 fiction titles published over the last 30 years into computers and trained them to “read”—to determine where sentences begin and end, to identify parts of speech, to map out plots. They then used so-called machine classification algorithms to isolate the features most common in bestsellers.

The result of their work—detailed in The Bestseller Code, out this month—is an algorithm built to predict, with 80 percent accuracy, which novels will become mega-bestsellers. What does it like? Young, strong heroines who are also misfits (the type found in *The Girl on the Train, Gone Girl, *and The Girl with the Dragon Tattoo). No sex, just “human closeness.” Frequent use of the verb “need.” Lots of contractions. Not a lot of exclamation marks. Dogs, yes; cats, meh. In all, the “bestseller-ometer” has identified 2,799 features strongly associated with bestsellers.

What Archer and Jockers have done is just one part of a larger movement in the publishing industry to replace gut instinct and wishful thinking with data. A handful of startups in the US and abroad claim to have created their own algorithms or other data-driven approaches that can help them pick novels and nonfiction topics that readers will love, as well as understand which books work for which audiences. Meanwhile, traditional publishers are doing their own experiments: Simon & Schuster hired its first data scientist last year; in May, Macmillan Publishers acquired the digital book publishing platform Pronoun, in part for its data and analytics capabilities.

While these efforts could bring more profit to an oft-struggling industry, the effect for readers is unclear.

“Part of the beautiful thing about books, unlike refrigerators or something, is that sometimes you pick up a book that you don’t know,” says Katherine Flynn, a partner at Boston-based literary agency Kneerim & Williams. “You get exposed to things you wouldn’t have necessarily thought you liked. You thought you liked tennis, but you can read a book about basketball. It’s sad to think that data could narrow our tastes and possibilities.”

They Know What You Did Last Night

Once, publishers had to rely on unit sales to figure out what readers wanted. Digital reading changed that. Publishers can know that you raced through a novel to the end, or that you abandoned it after 20 pages. They can know where and when you’re reading. On some reading sites and apps, users sign in with their Facebook accounts, opening up more personal data. There’s a wrinkle, though: Companies such as Amazon and Apple have the data for books read on their devices, and they aren’t sharing it with publishers.

London-based startup Jellybooks offers a workaround. Publishers can hire Jellybooks to conduct virtual focus groups, giving readers free ebooks, often in advance of publication, in exchange for their sharing data on how much, when, and where they read. Javascript is embedded in the books, and at the end of each chapter, readers are asked to click a link that sends the data to Jellybooks. In almost two years, the company has run tests for publishers in the US, England, and Germany, and uncovered one sobering fact: Most novels are abandoned before readers are halfway through them. Jellybooks’s findings can guide publishers on their marketing, and even whether it’s worth signing an author again. “Hollywood moguls might do test screenings for movies to decide on how much [marketing] budget a movie should get,” says Andrew Rhomberg, the founder of Jellybooks. “That was never done for books.”

The ability to know who reads what and how fast is also driving Berlin-based startup Inkitt. Founded by Ali Albazaz, who started coding at age 10, the English-language website invites writers to post their novels for all to see. Inkitt’s algorithms examine reading patterns and engagement levels. For the best performers, Inkitt offers to act as literary agent, pitching the works to traditional publishers and keeping the standard 15 percent commission if a deal results. The site went public in January 2015 and now has 80,000 stories and more than half a million readers around the world.

Albazaz, now 26, sees himself as democratizing the publishing world. “We never, ever, ever judge the books. That’s not our job. We check that the formatting is correct, the grammar is in place, we make sure that the cover is not pixelated,” he says. “Who are we to judge if the plot is good? That’s the job of the market. That’s the job of the readers.”

. . . .

The Data Scare

As Archer and Jocker shopped the *Bestseller Code *manuscript to acquisitions editors, word of their powerful algorithm spread—as did worry and suspicion among those in the publishing profession. “The fear is we can homogenize the market or try and somehow take their jobs away from them, and the answer is no and no,” says Archer. “What the bestseller-ometer is trying to do is say, ‘Hey, pick this new author that you might not dare take a risk on with your acquisitions budget. Their chance is really good.’” Archer, now a writer in Boulder, Colorado, insists that she and Jockers, now an English professor at the University of Nebraska-Lincoln, are “literature-friendly” and want good books to succeed.

Andrew Weber, the global chief operating officer for Macmillan Publishers—whose St. Martin’s Press is publishing *The Bestseller Code—thinks algorithms should be viewed as an additional piece of information, rather than as an excuse to fire the editors. “Whether it’s in acquisition, whether it’s in pricing, whether it’s in marketing, whether it’s in distribution, there just seem to be many, many, many opportunities to improve the quality of our decision-making—and therefore hopefully our results—*by bringing data into the equation,” says Weber. “I would say we are still in the early days of that journey, but that’s the direction we’re headed.”

Archer and Jockers watched eagerly to see which novel would be their algorithm’s favorite. It turned out to be The Circle, a 2013 technothriller by Dave Eggers about working for a massively powerful Internet company. The Circle spent multiple weeks on both The New York Times hardcover fiction and paperback trade fiction bestseller lists. A movie version starring Emma Watson and Tom Hanks is expected in theaters this year.

Link to the rest at Wired

It appears that PG missed this when it first appeared in 2016.

He suspects the almost-universal phobia towards computers, algorithms, quantitative analysis, sophisticated metrics, etc., among the indwellers of traditional publishing is related to the widespread incidence of innumeracy among English majors.

Worship of The Golden Gut is the state religion of this group. For them, no collection of numbers and formulae can ever replace The Hunch. That’s one reason why so many books fail to earn out their advances, how many mega-sellers are first rejected by every major publisher before stumbling into the market and finding success.

Indie authors include a much wider slice of humanity than either publishers or traditionally-published authors. That diversity of talent and background combined with Amazon’s relentless pursuit of customers and, thus, numbers, analytics, categories, sub-categories and sub-sub categories fosters the creation of niches within niches all the way down to the micro-reader level.

PG just checked a random book on the Zon and discovered that it encouraged drill-down and discovery as follows:

Books
* Mystery, Thriller & Suspense
*Thrillers & Suspense
* Suspense

With broad categories mentioned:

Book Fiction Moods

Romantic

Book Mystery Characters

Some Authors:

Author

(PG is not certain how much of this collection of information is presented as result of PG’s and Mrs. PG’s past buying habits.)

Finally, if you prefer, you could check out 383 different categories, series, spinoffs, heroes/heroines, etc., etc., etc., (including, 盗墓笔记, El cementerio de los libros, Svartåsen and Die Krimi-Serie in den Zwanzigern as follows:

1	900-Zombie-Thriller (1)
2	A Cotten Stone Mystery (1)
3	A Department Q Novel (1)
4	A Jonathan Grave Thriller (2)
5	A Topsail Island novel (1)
6	Aaron Falk (2)
7	Against Series / Raines of Wind Canyon (1)
8	Agatha Raisin Sammelband (1)
9	Agent Juliet (1)
10	Agent Pendergast (4)
11	Alex Cross (4)
12	Alex Delaware (13)
13	Alex Devlin (1)
14	Alex Hawke (6)
15	Alex McKnight (1)
16	Alexandra Cooper (5)
17	Alfonzo (1)
18	Ali Reynolds Series (15)
19	All Souls Trilogy (1)
20	Allison McNeil Series (1)
21	Alo Nudger (1)
22	Amos Decker (5)
23	An Elvis Cole Novel (3)
24	An FBI Thriller (1)
25	An Isaiah Coleridge Novel (1)
26	An Under Suspicion Novel (1)
27	Anderswelt John Sinclair Spin-off (18)
28	Andreas Gruber Erzählbände (1)
29	Anna Pigeon Mysteries (1)
30	Annie Carter Series (3)
31	Ash Henderson (2)
32	Asher Benson (1)
33	Auftrag: Mord! (3)
34	Beartooth, Montana (1)
35	Ben Abbott Mysteries (1)
36	Ben Hope (20)
37	Blood on Snow (2)
38	Bob Lee Swagger Novels (4)
39	Breaking Free (1)
40	Camel Club (2)
41	Cape Charade (3)
42	Carl Mørck (1)
43	Carriage House (5)
44	Carson Ryder (9)
45	Casey Woods (1)
46	Cat Who… (1)
47	Cate Austin (1)
48	Charlie Chan Mystery (1)
49	Chefinspektor Tony Braun (2)
50	Cherokee Pointe (1)
51	Chet and Bernie Mystery (16)
52	Chronicles of The One (7)
53	Cold Justice (1)
54	Commandant Martin Servaz (1)
55	Commissario Brunetti (8)
56	Conrad Yeats Adventure (1)
57	Cork O’Connor (17)
58	Cork O’Connor Mystery Series (12)
59	Cotton Malone (2)
60	Covert-One (1)
61	Crissa Stone (1)
62	Cutler (2)
63	D.I. Callanach (1)
64	Dagny Gray (1)
65	Dalziel & Pascoe (1)
66	Dalziel and Pascoe (14)
67	盗墓笔记 (1)
68	Dark Iceland (1)
69	Dave Gurney (1)
70	Dave Robicheaux (8)
71	David Stein (1)
72	David Wolf (1)
73	DCI Matilda Darke (1)
74	Dead series (1)
75	Detective Erika Foster (2)
76	Detective Josie Quinn (2)
77	Detective Mark Heckenburg (3)
78	Detective Max Rupert (2)
79	Detektei Lessing Kriminalserie (3)
80	DI Fawley (2)
81	Die ARES-Reihe (2)
82	Die Cormoran-Strike-Reihe (1)
83	Die Dead-Silencer-Saga (1)
84	Die Irene-Huss-Krimis (1)
85	Die Krimi-Serie in den Zwanzigern (23)
86	Dirk Pitt (1)
87	Dismas Hardy (15)
88	Divine (1)
89	Dr. Lazlo Kreizler (1)
90	Dr. Marissa Blumenthal (1)
91	Dr. Samantha Owens series (1)
92	Drake Ramsey (2)
93	DS Heckenburg (6)
94	DS Imogen Grey (2)
95	Dunkle Begierde (1)
96	Dynam (1)
97	Ed Eagle Novel (2)
98	Ein Fall fÃ¼r Engel und Sander (2)
99	Ein FBI Thriller mit Dillon Savich und Lacey Sherlock (3)
100	Ein Jack-Reacher-Roman (1)
101	Ein Mike-Köstner-Thriller (1)
102	El cementerio de los libros olvidados (1)
103	EL SECRETO DE LOS ARTISTAS (1)
104	Emma Fern (4)
105	Enrico Mancini (2)
106	Essex Witch Museum Mystery (2)
107	Eve Diamond Mystery (1)
108	Eve Duncan (2)
109	Event Group Thriller (1)
110	Fatal Insomnia Medical Thrillers (6)
111	FBI Profiler (1)
112	Final Theory (1)
113	Fiona Griffiths Crime Thriller Series (1)
114	Forensic Instincts (1)
115	Fort Aldamo (57)
116	Frank Wallerts Fälle (7)
117	Frankenstein (1)
118	Franz Eberhofer (3)
119	G. F. Unger Sonder-Edition (102)
120	G.F. Unger Classic-Edition (11)
121	Gabriel Allon (1)
122	Geisterjäger John Sinclair (6)
123	Gideon Crew (2)
124	Giordano Bruno (1)
125	Go-get-’em Women (1)
126	Good Lawyer (3)
127	Grant County (3)
128	Graveyard Falls (1)
129	Griffin Powell (1)
130	Guardian (1)
131	Hackberry Holland (3)
132	Harrison Investigation (2)
133	Harry Bosch (4)
134	Harry Palmer (1)
135	Hart and Drake (8)
136	Hector Cross Series (1)
137	Hercule Poirot (20)
138	High Country Heroes (2)
139	Hold On! (1)
140	Holly Barker (1)
141	Honeymoon Series James Patterson (1)
142	I Heart (1)
143	If I Run (4)
144	In Death (2)
145	Inspector Barbarotti (2)
146	Inspector Lynley (3)
147	Inspector Montalbano (2)
148	Inspector Montalbano Mysteries (1)
149	IQ (1)
150	Iron Lace (1)
151	Isas Requiem (1)
152	Jack Noble (1)
153	Jack Paris (1)
154	Jack Reacher (2)
155	Jack Sigler Thrillers (Chess Team) (1)
156	Jack Stapleton & Laurie Montgomery series (1)
157	Jacqueline Kirby (1)
158	Jake Brigance (7)
159	Jake Ransom (1)
160	James Blake (2)
161	Jane Harper Horror Novels (2)
162	Jane Hawk (2)
163	Jericho Quinn Thriller (8)
164	Jerry Cotton Sammelband (5)
165	Jerry Cotton Sammelbände (14)
166	Jerry Cotton Sonder-Edition (84)
167	Jerry Cotton Sonder-Edition Sammelbände (3)
168	Jet (4)
169	Joanna Stafford (1)
170	Joe Dillard Series (1)
171	Joe Pickett Series (2)
172	Joe Pike series (1)
173	Joe Sixsmith (3)
174	Johannes-Hornoff-T… (1)
175	John Reeves (2)
176	John Sinclair Collection (18)
177	John Sinclair Gespensterkrimi (1)
178	John Sinclair Gespensterkrimi Collection (9)
179	John Sinclair Großband (13)
180	John Sinclair Sammelband (8)
181	John Sinclair Sonder-Edition (67)
182	John Sinclair Sonder-Edition Sammelband (7)
183	Joona Linna (2)
184	Judith Kepler (1)
185	Jungle Beat (7)
186	Karin Slaughter Thriller-Bundle (2)
187	Kate Brannigan (4)
188	Kate Ivory (14)
189	Kate Maddox (2)
190	Kathryn-Dance-Thri… (1)
191	Kay Scarpetta (11)
192	Kick Lannigan (2)
193	Kimmo-Joentaa-Reihe (1)
194	King and Maxwell (9)
195	Kirstmann und Freytag (1)
196	Kitt Lundgren (1)
197	Kolt Raynor (1)
198	Lassiter 2101-2200 (3)
199	Lassiter 2201-2300 (10)
200	Last Option Search Team (3)
201	Last Stand (1)
202	Leo Demidow (1)
203	Leverage (2)
204	Liam Devlin series (1)
205	Lizzie Martin (2)
206	Logan McRae (5)
207	Logan McRae Collection (2)
208	Louis Kincaid (1)
209	Louise Rick series (2)
210	Lucy Clayburn (3)
211	Lucy Guardino FBI Thrillers (3)
212	luebbe digital ebook (5)
213	Luke Carlton (1)
214	Luna Maiwald RÃ¼genkrimi (1)
215	Maddrax (4)
216	Marc Dane (1)
217	Marcus (1)
218	Maura Ryan (2)
219	Maximum Ride: The Manga (2)
220	Maximum Security (1)
221	Medical Thrillers (Gerritsen) (1)
222	Mercy Kilpatrick (1)
223	Mia Quinn (1)
224	Michael Bennett (3)
225	Michael Herne (1)
226	Midwife (2)
227	Miss Marple Mysteries (1)
228	Mississippi (2)
229	Mitchell & Associates (4)
230	Monster Hunter International (1)
231	Nameless Detective (3)
232	Natalie King, Forensic Psychiatrist (1)
233	Nick Hall (2)
234	Night Soldiers (1)
235	Nils Trojan (1)
236	Nomad (1)
237	NYPD Red (2)
238	Odd Thomas (2)
239	Operation: Midnight (1)
240	OPSIG Team Black Series (1)
241	P.I.D. (2)
242	Penn Cage Novels (2)
243	Peter Decker & Rina Lazarus (4)
244	Peter Decker/Rina Lazarus (4)
245	Petra Connor (1)
246	Pilgrim (3)
247	Predator & Prey (1)
248	Prey (5)
249	Privatdetektiv Marten Hendriksen (1)
250	Private (2)
251	Promise Falls Trilogy (1)
252	Raines of Wind Canyon (2)
253	Random House Large Print (3)
254	Relatively Dead Mysteries (1)
255	Richard “Dick” Moonlight (1)
256	Rizzoli-&-Isles-Serie (2)
257	Robert Langdon (1)
258	Robicheaux (7)
259	Rocky Mountain Bounty Hunters (1)
260	Rocky Mountain K9 Unit (4)
261	Ryan Archer (1)
262	Sakura Warrior – Reihe (1)
263	Sally Harrington (1)
264	Sam Berger Series (1)
265	Sam Capra Mysteries (2)
266	Samson (1)
267	San Francisco (1)
268	Sandhamn Murders (2)
269	Sanela Beara (1)
270	Sarah Pauli (2)
271	Scarlet Falls (1)
272	Scope (2)
273	Sean Dillon (5)
274	Search and Rescue (4)
275	Second Opportunities (1)
276	Selena Alvarez/Regan Pescoli (1)
277	Shane Schofield (1)
278	Sharon McCone (3)
279	Sharpe & Donovan (2)
280	Shaw and Katie James (7)
281	Sigma Force (7)
282	Simon Vaughn (2)
283	Sisterhood (3)
284	Six Stories (2)
285	Skink (1)
286	Smoky Barrett (3)
287	Smoky Barrett Sammelband (1)
288	Soko Hamburg – Ein Fall für Heike Stein (18)
289	Sonderermittler der Krone (5)
290	Spilling CID (1)
291	Split Second (1)
292	Stalking Jack the Ripper (1)
293	Stephanie Plum (4)
294	Stephanie Plum Between the Numbers/Holiday Novels (1)
295	Stillhouse Lake (6)
296	Stone Barrington (7)
297	Stranger Things Novels (2)
298	Superintendent Battle (4)
299	Svartåsen (1)
300	Talisman (5)
301	Tall, Dark & Dangerous (1)
302	Temperance Brennan (6)
303	Teodor Szacki (2)
304	Texas Rangers (2)
305	Texas Trilogy (2)
306	The Annie Graham series (1)
307	The Avalon Chronicles (3)
308	The Awakening Series (1)
309	The Bening Files (2)
310	The Bill Hodges Trilogy (3)
311	The Blaine Trilogy (1)
312	The Butlers (6)
313	The Cal O’Connor Series (1)
314	The Cards in the Deck (2)
315	The Cat Who… (23)
316	The Cemetery of Forgotten Series (1)
317	The China Thrillers (3)
318	The Clifton Chronicles (10)
319	The Color of Distance (1)
320	The Commandant Camille Verhoeven Trilogy (2)
321	The Cooper & Fry Series (1)
322	The Cousins War (4)
323	The Dark Iceland Series (1)
324	The Dark Tower (6)
325	The Death Trilogy (3)
326	The End Series (1)
327	The Flovent and Thorson Thrillers (1)
328	The Immune (4)
329	The Kate Lange Thriller Series (2)
330	The Keepers (3)
331	The Men Of The Sisterhood (1)
332	The Mitch Rapp Prequel Series (8)
333	The Mitch Rapp Series (31)
334	The Oxygen Thief Diaries (2)
335	The Paul Chavasse Novels (2)
336	The Pieter Van In Mysteries (1)
337	The Psychic Detectives Series (1)
338	The Restoration Series (5)
339	The Retreat (2)
340	The Roth Trilogy (3)
341	The Sara Winthrop Thriller Series (1)
342	The Scot Harvath Series (45)
343	The Sean Coleman Thriller series (1)
344	The Talisman (5)
345	The Tallow Series (1)
346	The Warm Bodies Series (1)
347	Thomas Eickhoff ermittelt (1)
348	Thomas Kell (3)
349	Thomas Knight (1)
350	Tina Boyd (5)
351	Todeslächeln (2)
352	Tom Thorne (2)
353	Tom Thorne series (1)
354	Tommy and Tuppence (6)
355	Tracers Series (1)
356	Troubleshooters (1)
357	Turbulent Desire Series (2)
358	Twin Ports (1)
359	Ty Hauck (3)
360	Under Suspicion (1)
361	Undercover Cops (1)
362	Unit 51 (1)
363	V.I. Warshawski Novels (2)
364	Vampire Chronicles (1)
365	Vampire Federation (1)
366	Vintage Contemporaries (1)
367	Virgil Flowers (1)
368	Wayward Pines (6)
369	Wegner & Hauser – Hamburg: Mord (2)
370	Wegners erste Fälle (8)
371	Wegners schwerste Fälle (9)
372	Will Lee Novels (1)
373	Will Robie (2)
374	Will Trent/Atlanta Series (1)
375	Will-Trent-Serie (1)
376	William Sandberg (1)
377	Wired (2)
378	Wired & Dangerous (1)
379	Wishbone (1)
380	Women’s Murder Club (9)
381	World War I (1)
382	World’s Scariest Places Occult & Supernatural Crime Series (7)
383	Wyman Ford (7)

Click to Tweet/Email/Share This Post

It’s a lot worse than PG says. Publishing executives aren’t precisely innumerate — they can plug numbers into spreadsheets to make decisions for them all day long. The problem is that the entertainment industry is anti-science, perhaps the epitome of C.P. Snow’s “Two Cultures” problem: Because the subject of the entertainment industry is “art,” gathering and evaluating actual data gets no attention whatsoever.

Consider the most-obvious flaw in the Archer and Jockers project (other, that is, than that it is extolled as cutting-edge in Wired, which is almost always a tinfoil-hat warning): It was conducted on each work as if each work stood alone. Not just “part of a series,” but “in time” and “in immediate social context.” Consider the rise in badly-limned followers-of-Mohammed “bad guys” starting in the late 1990s and accelerating rapidly afterward, compared to followers-of-Stalin “bad guys” in the same period. And so on.

And as flawed as that project is, it’s vastly more searching than anything done in the publishing industry. For example, there’s a meme that trade books with predominantly green covers don’t sell. It was out of date when it was thrown at me in the 1990s: It was based on a combination of lighting characteristics and ink chemistries from the early 1960s that existed practically nowhere by 1990. Similarly for “embossed lettering sells books” (how does that work on Amazon, BTW — let alone for e-books, when the metallic shades most common in embossed lettering get distorted by 56 different types of displays?). But I’ve seen both of these memes presented as absolute, irrefutable fact in the last two years.

The real problem is that management doesn’t want to know anything that might require it to make expensive changes to its existing system.

16 thoughts on “Algorithms Could Save Book Publishing—But Ruin Novels”

CE Petit

March 6, 2020 at 11:43 AM

It’s a lot worse than PG says. Publishing executives aren’t precisely innumerate — they can plug numbers into spreadsheets to make decisions for them all day long. The problem is that the entertainment industry is anti-science, perhaps the epitome of C.P. Snow’s “Two Cultures” problem: Because the subject of the entertainment industry is “art,” gathering and evaluating actual data gets no attention whatsoever.

Consider the most-obvious flaw in the Archer and Jockers project (other, that is, than that it is extolled as cutting-edge in Wired, which is almost always a tinfoil-hat warning): It was conducted on each work as if each work stood alone. Not just “part of a series,” but “in time” and “in immediate social context.” Consider the rise in badly-limned followers-of-Mohammed “bad guys” starting in the late 1990s and accelerating rapidly afterward, compared to followers-of-Stalin “bad guys” in the same period. And so on.

And as flawed as that project is, it’s vastly more searching than anything done in the publishing industry. For example, there’s a meme that trade books with predominantly green covers don’t sell. It was out of date when it was thrown at me in the 1990s: It was based on a combination of lighting characteristics and ink chemistries from the early 1960s that existed practically nowhere by 1990. Similarly for “embossed lettering sells books” (how does that work on Amazon, BTW — let alone for e-books, when the metallic shades most common in embossed lettering get distorted by 56 different types of displays?). But I’ve seen both of these memes presented as absolute, irrefutable fact in the last two years.

The real problem is that management doesn’t want to know anything that might require it to make expensive changes to its existing system.
- Elliot01
  
  March 6, 2020 at 12:42 PM
  
  Because the subject of the entertainment industry is “art,” gathering and evaluating actual data gets no attention whatsoever.
  
  TV ratings do seem to get a passing glance.
  - CE Petit
    
    March 6, 2020 at 1:54 PM
    
    … which assumes:
    
    That TV ratings are meaningful as to actual perception of the programming (example: how much of Super Bowl ratings are for the commercials?)
    
    That the means of gathering TV ratings bear some relationship to reality
    
    That the time over which TV ratings are gathered for different programs allows comparison (is “same day plus three” comparable for football games, weekday soap operas, Judge Judy, Survivor, and the final episode of M*A*S*H?)
    
    That “ratings” directly correlate to “enthusiasm”
    
    Remember, the ratings are not used to establish anything except “rate chargeable to advertisers”; they don’t actual reflect revenue (there is more than one program currently airing with higher ratings that can’t ordinarily sell all of its ad time at the full rate, for example)
    
    In short, “tracking stuff to which a number can be attached” is not necessarily gathering valid data.
    - Elliot01
      
      March 6, 2020 at 7:01 PM
      
      … which assumes:
      
      It assumes nothing.
      
      It’s an observation that ratings are actual data that gets significant attention from the entertainment industry. And, as you say, that data reflects what advertisers will pay. Price is one of the important decision variables in a for-profit operation.
      
      And is entertainment art? I don’t know. Who cares? Not consumers, not advertisers, and not the entertainment industry. Maybe artists care.
      - Tom Simon
        
        March 9, 2020 at 1:15 PM
        
        The people who interpret the data do, in fact, assume every one of the things that CE mentioned and you so airily dismissed. Raw data is of very little use unless you understand what you are measuring, why it is (or isn’t) important, and what its limitations are.
        
        Elliot01
        
        March 9, 2020 at 7:06 PM
        
        The claim was, “gathering and evaluating actual data gets no attention whatsoever.”
        
        Tom Simon
        
        March 9, 2020 at 7:20 PM
        
        If one doesn’t understand any of the things I mentioned, one is not paying any attention whatsoever to evaluating the data. I suppose you could quibble about whether ‘gathering and evaluating’ is to be construed conjunctively or disjunctively.
        
        Elliot01
        
        March 9, 2020 at 7:56 PM
        
        I’m too dumb to know what that means.
        
        Tom Simon
        
        March 9, 2020 at 8:20 PM
        
        Then you might not be the person best equipped to assess the quality of someone else’s data analysis.
        
        Elliot01
        
        March 9, 2020 at 10:55 PM
        
        Equipped conjunctively or disjunctively?
        
        Tom Simon
        
        March 11, 2020 at 4:01 PM
        
        The dictionary is your friend.
        
        Elliot01
        
        March 11, 2020 at 6:50 PM
        
        The dictionary is your friend.
        
        I’m still too dumb to know what that means.
        
        Tom Simon
        
        March 12, 2020 at 11:39 AM
        
        At least you admit it.
      - Felix.J.Torres
        
        March 9, 2020 at 8:42 PM
        
        Uh, ratings *aren’t* actual data.
        They are algorithmically processed incomplete data.
        Instead of actual viewers, they are a weighted aggregate of live viewers, dvr viewers, and first-week streaming viewers. With each category getting a different multiplier.
        That is why the highest ratings go to sports and why advertisers pay more for them: they aren’t dvr-able and viewers can’t pause and return a day later.
        Nielsen doesn’t even collect full data. They estimate viewership by varying means, some of which are selfreported.
        It is simply a standardized metric that ranks programs’ estimated mass appeal. It doesn’t even come close to measuring profitabilty since many low-rated shows are gold mines for the producers while higher rated productions are money losers which means the metric isn’t even good at what it is really supposed to measure. Think of saddle shows: half hour shows slotted between two popular shows. They have good ratings whike carried by the saddle but tank when set as the lead show in a block because the show lacks intrinsic appeal. More often than not it is simply prefered to watching the second half hour of a one hour show.
        So no, ratings aren’t real data and the TV world knows it.
        They just have nothing better.
        
        Elliot01
        
        March 9, 2020 at 11:07 PM
        
        I agree ratings data don’t show profitability. Who but the conjunctively disjunctive think they come close? They don’t consider costs.
Elliot01

March 6, 2020 at 10:45 AM

If they used a training set of 5,000 books, how did the algorithm perform when faced with a test of another 5,000 books it had never seen?

I’d also ask why they chose to publish rather than shop their system to publishers.

Comments are closed.