French Media Complain About Google

From The Japan Times:

French media organizations lodged a complaint against Google to the country’s competition authority Wednesday over the U.S. internet giant’s refusal to pay for displaying their content.

The move sets up a legal fight with Google over a new EU copyright law that could have huge repercussions for the future of the press.

Earlier this year France became the first country to ratify the law, which aims to ensure publishers are compensated when their work is displayed online.

The APIG press alliance, which groups dozens of national and regional newspapers, the union of magazine editors as well as AFP, which has brought a separate complaint, argued that Google is flouting the law.

. . . .

But Google — which holds a virtual monopoly on internet searches — said articles, pictures and videos will be shown in search results only if media groups consent to let the tech giant use them for free.

If they refuse, only a headline and a bare link to the content will appear, Google said, almost certainly resulting in a loss of visibility and potential ad revenue for the publisher.

Google was effectively offering the press a choice about how it would like to die — “either from cholera or the plague,” said Jean-Michel Baylet, the APIG president.

. . . .

“Google helps internet users find news content from many sources and the results are always based on relevance, not trade agreements,” it said in a statement to AFP last month.

The company insisted that “publishers have never had so many choices about how their content is displayed on Google.”

“The law does not impose a fee for posting links, and European news publishers already derive significant value from the 8 billion visits they receive each month from internet users who do searches on Google,” it said.

But French President Emmanuel Macron has already voiced his support for the press, saying that no company can “break free” of the law in France.

Link to the rest at The Japan Times

Tracking Phones, Google Is a Dragnet for the Police

Not actually to do with writing (except for crime novelists), but interesting and creepy – a little Monday paranoia pick-me-up.

From The New York Times:

When detectives in a Phoenix suburb arrested a warehouse worker in a murder investigation last December, they credited a new technique with breaking open the case after other leads went cold.

The police told the suspect, Jorge Molina, they had data tracking his phone to the site where a man was shot nine months earlier. They had made the discovery after obtaining a search warrant that required Google to provide information on all devices it recorded near the killing, potentially capturing the whereabouts of anyone in the area.

Investigators also had other circumstantial evidence, including security video of someone firing a gun from a white Honda Civic, the same model that Mr. Molina owned, though they could not see the license plate or attacker.

But after he spent nearly a week in jail, the case against Mr. Molina fell apart as investigators learned new information and released him. Last month, the police arrested another man: his mother’s ex-boyfriend, who had sometimes used Mr. Molina’s car.

. . . .

The warrants, which draw on an enormous Google database employees call Sensorvault, turn the business of tracking cellphone users’ locations into a digital dragnet for law enforcement. In an era of ubiquitous data gathering by tech companies, it is just the latest example of how personal information — where you go, who your friends are, what you read, eat and watch, and when you do it — is being used for purposes many people never expected. As privacy concerns have mounted among consumers, policymakers and regulators, tech companies have come under intensifying scrutiny over their data collection practices.

The Arizona case demonstrates the promise and perils of the new investigative technique, whose use has risen sharply in the past six months, according to Google employees familiar with the requests. It can help solve crimes. But it can also snare innocent people.

Technology companies have for years responded to court orders for specific users’ information. The new warrants go further, suggesting possible suspects and witnesses in the absence of other clues. Often, Google employees said, the company responds to a single warrant with location information on dozens or hundreds of devices.

. . . .

The practice was first used by federal agents in 2016, according to Google employees, and first publicly reported last year in North Carolina. It has since spread to local departments across the country, including in California, Florida, Minnesota and Washington. This year, one Google employee said, the company received as many as 180 requests in one week.

. . . .

The technique illustrates a phenomenon privacy advocates have long referred to as the “if you build it, they will come” principle — anytime a technology company creates a system that could be used in surveillance, law enforcement inevitably comes knocking. Sensorvault, according to Google employees, includes detailed location records involving at least hundreds of millions of devices worldwide and dating back nearly a decade.

The new orders, sometimes called “geofence” warrants, specify an area and a time period, and Google gathers information from Sensorvault about the devices that were there. It labels them with anonymous ID numbers, and detectives look at locations and movement patterns to see if any appear relevant to the crime. Once they narrow the field to a few devices they think belong to suspects or witnesses, Google reveals the users’ names and other information.

Link to the rest at The New York Times

A quick search disclosed a couple of interesting articles about erasing your mobile history. PG can’t assess the efficacy of these solutions because they rely on whether large organizations who create software that exists on your mobile phone really erase all the history they have acquired about you or not.

How to stop Google from tracking everything you do online

How to Clear History on an iPhone

If you would like to make certain your phone won’t collect location information when you’re not actively using it, you can acquire a Faraday Bag to prevent your phone from sending or receiving electronic signals so long as it is in the bag. A Faraday Bag is a smaller version of a Faraday Cage.

During his brief excursion into electronic tracking, PG learned that police are urged to place cell phones and laptops they have seized as evidence in criminal investigations inside Faraday Evidence Bags to protect them from being remotely erased or broadcasting their locations to bad guys and bad gals.

Logic Bombs

In true spy vs. spy traditions, Faraday Evidence Bags may not solve all evidence preservation problems.

From Forensic Magazine:

The Faraday bag will not prevent the device from internal data alteration by items such as logic bombs. A logic bomb is set to go off if certain conditions are met. If a person was supposed to simultaneously press a set of keys daily to keep a destructive program from running on the cell phone, this would be one example of a logic bomb. The phone that was seized from someone may be protected from outside control of hackers with the use of a Faraday bag, but the phone may be victim to a logic bomb if certain conditions are not met while the phone is in possession of the CCE.

Link to the rest at Forensic Magazine

Google Exposed User Data, Feared Repercussions of Disclosing to Public

From The Wall Street Journal:

Google exposed the private data of hundreds of thousands of users of the Google+ social network and then opted not to disclose the issue this past spring, in part because of fears that doing so would draw regulatory scrutiny and cause reputational damage, according to people briefed on the incident and documents reviewed by The Wall Street Journal.

As part of its response to the incident, the Alphabet Inc. unit plans to announce a sweeping set of data privacy measures that include permanently shutting down all consumer functionality of Google+, the people said. The move effectively puts the final nail in the coffin of a product that was launched in 2011 to challenge Facebook Inc. and is widely seen as one of Google’s biggest failures.

A software glitch in the social site gave outside developers potential access to private Google+ profile data between 2015 and March 2018, when internal investigators discovered and fixed the issue, according to the documents and people briefed on the incident. A memo reviewed by the Journal prepared by Google’s legal and policy staff and shared with senior executives warned that disclosing the incident would likely trigger “immediate regulatory interest” and invite comparisons to Facebook’s leak of user information to data firm Cambridge Analytica.

. . . .

Chief Executive Sundar Pichai was briefed on the plan not to notify users after an internal committee had reached that decision, the people said.

The planned closure of Google+ is part of a broader review of privacy practices by Google that has determined the company needs tighter controls on several major products, the people said.

. . . .

The episode involving Google+, which hasn’t been previously reported, shows the company’s concerted efforts to avoid public scrutiny of how it handles user information, particularly at a time when regulators and consumer privacy groups are leading a charge to hold tech giants accountable for the vast power they wield over the personal data of billions of people.

The snafu threatens to give Google a black eye on privacy after public assurances that it was less susceptible to data gaffes like those that have befallen Facebook. It may also complicate Google’s attempts to stave off unfavorable regulation in Washington. Mr. Pichai recently agreed to testify before Congress in the coming weeks.

“Whenever user data may have been affected, we go beyond our legal requirements and apply several criteria focused on our users in determining whether to provide notice,” a Google spokesman said in a statement.

In weighing whether to disclose the incident, the company considered “whether we could accurately identify the users to inform, whether there was any evidence of misuse, and whether there were any actions a developer or user could take in response,” he said. “None of these thresholds were met here.”

The internal memo from legal and policy staff says the company has no evidence that any outside developers misused the data but acknowledges it has no way of knowing for sure.

. . . .

In the announcement expected on Monday, Google plans to clamp down on the data it provides outside developers through APIs, two people briefed on the matter said. The company will stop letting most outside developers gain access to SMS messaging data, call log data and some forms of contact data on Android phones, and Gmail will only permit a small number of developers to continue building add-ons for the email service, the people said.

Google faced pressure to rein in developer access to Gmail earlier this year, after a Wall Street Journal examination found that developers commonly use free email apps to hook users into giving up access to their inboxes without clearly stating what data they collect. In some cases, employees at these app companies have read people’s actual emails to improve their software algorithms.

. . . .

In March of this year, Google discovered that Google+ also permitted developers to retrieve the data of some users who never intended to share it publicly, according to the memo and two people briefed on the matter. Because of a bug in the API, developers could collect the profile data of their users’ friends even if that data was explicitly marked nonpublic in Google’s privacy settings, the people said.

During a two-week period in late March, Google ran tests to determine the impact of the bug, one of the people said. It found 496,951 users who had shared private profile data with a friend could have had that data accessed by an outside developer, the person said. Some of the individuals whose data was exposed to potential misuse included paying users of G Suite, a set of productivity tools including Google Docs and Drive, the person said. G Suite customers include businesses, schools and governments.

. . . .

The bug existed since 2015, and it is unclear whether a larger number of users may have been affected over that time.

Google believes up to 438 applications had access to the unauthorized Google+ data, the people said. Strobe investigators, after testing some of the apps and checking to see if any of the developers had previous complaints against them, determined none of the developers looked suspicious, the people said. The company’s ability to determine what was done with the data was limited because the company doesn’t have “audit rights” over its developers, the memo said. The company didn’t call or visit with any of the developers, the people said.

The question of whether to notify users went before Google’s Privacy and Data Protection Office, a council of top product executives who oversee key decisions relating to privacy, the people said.

Internal lawyers advised that Google wasn’t legally required to disclose the incident to the public, the people said. Because the company didn’t know what developers may have what data, the group also didn’t believe notifying users would give any actionable benefit to the end users, the people said.

The memo from legal and policy staff wasn’t a factor in the decision, said a person familiar with the process, but reflected internal disagreements over how to handle the matter.

The document shows Google officials knew that disclosure could have serious ramifications. Revealing the incident would likely result “in us coming into the spotlight alongside or even instead of Facebook despite having stayed under the radar throughout the Cambridge Analytica scandal,” the memo said. It “almost guarantees Sundar will testify before Congress.”

. . . .

Google could also face class action lawsuits over its decision not to disclose the incident, Mr. Saikali said. “The story here that the plaintiffs will tell is that Google knew something here and hid it. That by itself is enough to make the lawyers salivate,” he said.

Link to the rest at The Wall Street Journal

PG notes that Google removed its well-known motto, “Don’t be evil,” from its Code of Conduct earlier this year.

Given all the negative publicity originating in large tech firms in recent months, PG wonders if, as a group, tech startups are more reliable than the established tech giants. He also notes that, when the code underlying a product or service becomes sufficiently complex, it becomes more and more difficult to locate and identify flaws in that code.

Twenty Years with Google Books Searches

Google is celebrating its 20th anniversary.

One of the ways it is doing so is to track the most commonly-searched topics on a year-by-year basis. It’s broken these topics into various areas of interest.

One of these areas is Books.

The Books page provides you with a variety of insights including the following:

  • The peak number of searches for To Kill a Mockingbird was from 2014 to 2018
  • Lolita was the most searched book from 1999 to 2002.
  • Hamlet was the 3rd most searched book of 1999, 2001, 2002 and 2005.

There is also an Authors page which shows that during each of the past twenty years, the most common Author search was for Martin Luther King.

Here’s a link to insights about Google searches on Les Misérables

You can hover and click on any portion of the resulting graphs for additional information.

Is Google Attempting to Hack the EU Parliament with Robo Calls, Emails and Fake News?

From The Trichordist:

Think it’s a coincidence that Google’s search algorithm returns exclusively negative or outright fake news on EU proposed copyright revisions? 

Google is the first imperialist power of the 21st century.  It has no qualms about subverting democratic processes whenever those processes threaten it’s profits.  Most of the time we see these power grabs in the US.  For instance Google used stolen emails to derail a Mississippi State investigation into it’s advertising practices. Most recently Google used it’s pet Senator (Ron Wyden) to try to derail an anti child sex trafficking bill. Wyden was one of only two Senators to oppose the overwhelmingly popular bill.  WTF right?  Makes you wonder what they have on him.

There are so many cases of Google strong arming government officials it would take fifty pages to list them all.  Suffice it to say that in almost all these cases Google upends the democratic processes when government actions in some small way threaten googles internet advertising and web hosting businesses.  From Google’s perspective it makes sense as Google is willing to monetizes any and all web traffic with no oversight, and with no regard to how abhorrent that traffic may be. Google does not give a shit that it may be enabling child prostitution rings, the opioid crisis, or radicalizing lone wolf terrorists.  Any regulation that requires even minimal oversight and might cut into Google’s $110 billion yearly profit(profit not revenue) is attacked by Googles vast network of lobbyists, astroturf groups, google-funded think tanks, paid bloggers, and academics.

The last few years we have seen Google turn their efforts towards subverting democratic processes outside the US.  In some ways they have been more effective in places like EU where they are unaccustomed to the kind of subversive political/academic/NGO practices honed by Big Tobacco.  In the U.S. we have been partially inoculated. Europeans fall hook line and sinker for this shit.

Case in point.

The EU parliament legal affairs committee (recently) voted to approve a new copyright directive  giving authors, performers and songwriters much more control over how their work appears online. The directive would require online platforms to pro-actively manage their platforms so that creators could decide when and if their content appears on digital platforms and under what financial terms.

This does not make Google/YouTube very happy because currently they enjoy an massive subsidy from creators because they essentially use whatever they want  whenever they want. As usual they claim that it is their “users” who are doing the infringing. Not Google. Never mind that Google is making billions slinging ads against all this unlicensed content.

. . . .

In the U.S. Google has consistently used  groups like Fight For The Future.   Fight For The Future purports to be a grassroots organization but it is actually run by a Google lobbyist. Despite claiming to have millions of followers, when they tried to stage a protest in San Francisco before a copyright roundtable they couldn’t get a single real individual to show up. Astroturf.  Fake.

. . . .

During the last round of Copyright Office hearings on safe harbors we observed that the vast majority of tweets against copyright reform were coming from anonymous accounts that were only active when copyright issues were being considered. Fake.

. . . .

Fight for the Future the astroturf group run by Google lobbyist has repeatedly bombarded congress, and federal agencies with identical automated emails and comments. We demonstrated that the “tool” they provided from their website, didn’t verify identity; allowed users from outside US to vote; and allowed repeated voting by simply reloading page.

Link to the rest at The Trichordist

How Google and Facebook Are Monopolizing Ideas

From The Wall Street Journal:

In early May Google banned bail-bond companies from advertising on its platforms. Such companies profit from “communities of color and low income neighborhoods when they are at their most vulnerable,” it explained in a blog post. They use “opaque financing offers that can keep people in debt for months or years.”

That Google can ban ads from an industry that offends its values is not, by itself, noteworthy. Media companies have long decided what content or ads to carry for the same reason. The difference is that even after decades of consolidation, no media company enjoys a U.S. market share as dominant as Google’s in Internet search (close to 90%) or Facebook Inc.’s in social networking. Like earlier bans on payday-loan ads, Google’s bail-bond ad ban, which Facebook copied the next day, effectively kicked an entire industry out of a major advertising channel.

The debate over whether Google, a unit of Alphabet Inc., and Facebook are too big usually revolves around economics: Do they suppress competition for goods and services? The bail-bond ad ban raises a different, and potentially more troubling, possibility: that they also undermine competition for values and ideas. While Google and Facebook claim to be neutral platforms connecting users, advertisers and content providers, decisions about which ads to ban and which content to delete or reclassify are inherently value-laden, even when those values are embedded in an algorithm.

Data monopolies “can actually be more dangerous than traditional monopolies,” Maurice Stucke, a law professor at the University of Tennessee, Knoxville specializing in antitrust, wrote earlier this year in Harvard Business Review. “They can affect not only our wallets but our privacy, autonomy, democracy, and well-being.”

Bail bonds aren’t a sympathetic industry. For a steep fee, agents agree to pay the court’s required bail if the client doesn’t show up for a court date. They are, however, legal and, in most states, regulated. And the industry says it serves low-income and minority clients because they are caught up in the criminal-justice system without the means to post bail on their own.

Jeff Clayton, executive director of the American Bail Coalition, whose members insure bail agents, says Google gave the industry no opportunity to comment on or appeal the ban. A Google spokeswoman declined to comment. Facebook did consult with both the industry and criminal-justice-reform groups after announcing its ban, a spokesman said.

Bail-bond agents used to advertise in the yellow pages, but as the public abandoned phone books for Google, so did the industry. “There are just no other options,” Mr. Clayton said. The ban doesn’t extend to regular search results, but it makes it harder for individual companies to stand out.

Conservatives tend to see tech companies’ progressive leanings at work in what gets banned or reclassified—for example, Facebook’s labeling of videos by two prominent supporters of President Donald Trump as “unsafe.” Bail bonds and payday loans have long been targets of progressive activist groups.

But as the companies come under growing pressure to police their platforms and weed out “fake news,” a growing range of content gets banned, labeled or deleted for often opaque or arbitrary reasons. ProPublica and Reveal, both nonprofit news publications, have had content dealing with hate groups and immigrant children, respectively, deleted or rejected by Instagram or Facebook. Video artists complain of viewership and ads being restricted because their content violated YouTube’s community standards.

Unhappy users, advertisers and content providers wouldn’t have as much to complain about if Google (which bought YouTube in 2006) and Facebook (which acquired Instagram in 2012) had strong competitors to which they could switch.

Absent such competition, expect pressure for the government to regulate it. But that’s a slippery slope. Politically appointed overseers may simply replace the companies’ judgments with their own. For that reason the Federal Communications Commission long ago gave up policing the nation’s airwaves for fairness.

Link to the rest at The Wall Street Journal 

How to Keep Google From Owning Your Online Life

From The Wall Street Journal:

About 10 minutes after I decided to try temporarily removing Google from my life—an experiment I hoped would illuminate how much Alphabet’s giant dominates online existence—I messed it all up.

I spotted a video of Donald Glover, co-star of “Solo: A Star Wars Story,” giving a Millennium Falcon tour. Even on my most careful guard, I still clicked the red play button. A few seconds in, I realized I was watching YouTube—Google’s YouTube.

Google is so woven into the fabric of the internet it’s all but impossible to avoid. It’s where billions of users find, create and store important information, where they work and distract themselves from working. You can quit Facebook or take a Twitter break and barely notice, save for an increased sense of boredom in the Starbucks line. Google, you’d miss.

But even more than other companies offering free services, Google collects astounding amounts of data about you and uses it to sell ads. I’m happy with Google, because to date there haven’t been reports of catastrophic breaches or data-sharing scandals on the level of Facebook’s Cambridge Analytica nightmare. If Google springs a leak, it could be disastrous.

. . . .

Quitting Google takes more than just typing “bing.com.” I deleted 16 apps from my phone, from Gmail to Google Maps to Google Photos. I unplugged my Google Home, yanked the Chromecast from the back of my TV, and powered down my Chromebook. Luckily I don’t own a Nest thermostat, or this would have become a construction project.

I hadn’t realized before how my life had come to revolve around Google products. To replace them, I brought in an Amazon Echo and a Microsoft Surface Laptop. I used the Notion app and Dropbox Paper for notes and documents, and switched cord-cutting allegiance from YouTube TV to Sling. I deleted the Chrome browser from all my devices, and installed Firefox in its place.

Most Google services have straightforward replacements: Microsoft’s free Office Online for Docs and Sheets; Signal for Hangouts; Evernote for Keep; and Flipboard for Google News. In many cases you can download your Google data using its Takeout service, upload it to a new app—for instance, bringing email and calendars into Outlook—and hardly miss a beat. iPhone users who switch their search engine to Bing or DuckDuckGo and use Apple’s productivity apps seldom encounter Google.

. . . .

As Google products have taken over, they’ve also become more insular and closed. Google Search tries to answer your questions without ever taking you to another site. Gmail’s best security features are a hassle to use, except for other Gmail users. The Chrome browser is the worst offender: Some Google services, like Google Earth, work only in Chrome—though Google says it’s changing that.

. . . .

Chrome commands nearly 60% market share, according to analytics company Statcounter—over four times as large as second-place Safari. It has outsize influence over the future of the web. Companies such as Airbnb and Bank of America have directed users to Chrome for the “optimized” versions of their sites. If you use a Google product in another browser, Google frequently prompts you to download Chrome. (Google says it is dedicated to supporting other browsers.)

By almost any measure, Google collects more data than Facebook. I recommend doing a thorough audit of your My Activity page, which displays everything Google watches you do. You should also manage and delete data through Google’s privacy and security checkups.

On a recent day, Google tracked me in 468 different activities—many that had nothing to do with Google, except that I did them using a Chromebook, Android phone or Chrome browser.

Link to the rest at The Wall Street Journal 

The Antitrust Case Against Facebook, Google, Amazon and Apple

From The Wall Street Journal:

Standard Oil and Co. and American Telephone and Telegraph Co. were the technological titans of their day, commanding more than 80% of their markets.

Today’s tech giants are just as dominant: In the U.S., Alphabet Inc.’s Google drives 89% of internet search; 95% of young adults on the internet use a Facebook Inc. product; and Amazon.com Inc. now accounts for 75% of electronic book sales. Those firms that aren’t monopolists are duopolists: Google and Facebook absorbed 63% of online ad spending last year; Google and Apple Inc. provide 99% of mobile phone operating systems; while Apple and Microsoft Corp. supply 95% of desktop operating systems.

A growing number of critics think these tech giants need to be broken up or regulated as Standard Oil and AT&T once were. Their alleged sins run the gamut from disseminating fake news and fostering addiction to laying waste to small towns’ shopping districts. But antitrust regulators have a narrow test: Does their size leave consumers worse off?

By that standard, there isn’t a clear case for going after big tech—at least for now. They are driving down prices and rolling out new and often improved products and services every week.

That may not be true in the future: if market dominance means fewer competitors and less innovation, consumers will be worse off than if those companies had been restrained. “The impact on innovation can be the most important competitive effect” in an antitrust case, says Fiona Scott Morton, a Yale University economist who served in the Justice Department’s antitrust division under Barack Obama.

. . . .

“Forty percent of Google search is local,” says Luther Lowe, the company’s head of public policy. “There should be hundreds of Yelps. There’s not. No one is pitching investors to build a service that relies on discovery through Facebook or Google to grow, because venture capitalists think it’s a poor bet.”

There are key differences between today’s tech giants and monopolists of previous eras. Standard Oil and AT&T used trusts, regulations and patents to keep out or co-opt competitors. They were respected but unloved. By contrast, Google and Facebook give away their main product, while Amazon undercuts traditional retailers so aggressively it may be holding down inflation. None enjoys a government-sanctioned monopoly; all invest prodigiously in new products. Alphabet plows 16% of revenue back into research and development; for Facebook it’s 21%—ratios far higher than other companies. All are among the public’s most loved brands, according to polls by Morning Consult.

Yet there are also important parallels. The monopolies of old and of today were built on proprietary technology and physical networks that drove down costs while locking in customers, erecting formidable barriers to entry. Just as Standard Oil and AT&T were once critical to the nation’s economic infrastructure, today’s tech giants are gatekeepers to the internet economy. If they’re imposing a cost, it may not be what customers pay but the products they never see.

. . . .

The story of AT&T is similar. It owed its early growth and dominant market position to Alexander Graham Bell’s 1876 patent for the telephone. After the related patents expired in the 1890s, new exchanges sprung up in countless cities to compete.

Competition was a powerful prod to innovation: Independent companies, by installing twisted copper lines and automatic switching, forced AT&T to do the same. But AT&T, like today’s tech giants, had “network effects” on its side.

“Just like people joined Facebook because everyone else was on Facebook, the biggest competitive advantage AT&T had was that it was interconnected,” says Milton Mueller, a professor at the Georgia Institute of Technology who has studied the history of technology policy.

Early in the 20th century, AT&T began buying up local competitors and refusing to connect independent exchanges to its long-distance lines, arousing antitrust complaints. By the 1920s, it was allowed to become a monopoly in exchange for universal service in the communities it served. By 1939, the company carried more than 90% of calls.

Though AT&T’s research unit, Bell Labs, became synonymous with groundbreaking discoveries, in telephone innovation AT&T was a laggard. To protect its own lucrative equipment business it prohibited innovative devices such as the Hush-a-Phone, which kept others from overhearing calls, and the Carterphone, which patched calls over radio airwaves, from connecting to its network.

After AT&T was broken up into separate local and long-distance companies in 1982, telecommunication innovation blossomed, spreading to digital switching, fiber optics, cellphones—and the internet.

Link to the rest at The Wall Street Journal

Why Amazon is the new Microsoft

From TechConnect:

A few years ago, chatbots were supposed to take over as a leading way to interact with the internet. They would live on our phones and in our messaging apps. Whenever we needed anything, all we had to do was type out a question.

Things are turning out … differently.

Chatbots, bots, virtual assistants and agents are all about the conversational UI — about interacting with a computer through natural-language words and sentences.

The conventional wisdom used to be that the chatbot revolution would be driven by pre-emption, interjection and agency, as exemplified by Facebook M and Google Now.

Instead, the killer features are hands-free voice interaction and ubiquity — the main strengths of the Amazon Alexa platform.

. . . .

Facebook M is dead.

Facebook plans to close it’s M chatbot service on Jan. 19.

Facebook M, which launched in August 2015, was experimental, available to only 10,000 people in Silicon Valley.

When M first emerged, it was widely assumed to represent the future of how chatbots should and would work.

. . . .

Google Now is dead, too — sort of.

A few years ago, the conventional wisdom in tech circles was that Google Now was the most sophisticated virtual assistant.

Google Now was introduced in Android in the summer of 2012.

The best thing about Google Now was pre-emption: Display cards would pop up to alert you to things (rather than waiting for you to ask). Google Now used your location, calendar and, above all, Gmail messages to figure out what kind of help you needed, and it would try to give you that help with suggestion cards. One of its best tricks was to see on your calendar where you were going, check your current location, check the traffic between those locations, and give you advice about when to leave.

Meanwhile, the coolest feature of Google Assistant is interjection, which means it will pay attention to conversations in Allo and make suggestions based on the conversation.

Unfortunately, hardly anyone uses Allo, and so the amazing interjection powers of the Google Assistant are largely unknown and generally unused by the larger public.

. . . .

A couple of years ago, Amazon Alexa was considered to be the weakest and least sophisticated chatbot or virtual assistant on the market. (Oddly, MS-DOS and, later, Microsoft Windows initially had similar reputations.)

While agency, including the ability to buy things, was once assumed to be an important feature of a virtual assistant, it’s clear even for Alexa that buying things is secondary.

According to an Experian study last year, fewer than one-third of surveyed Echo owners have ever bought something through Alexa.

The vast majority of tasks involve setting a timer, playing a song, reading the news, checking the time — really, the most basic functions of a smartphone made convenient by voice interaction.

And yet Amazon is clearly dominating the space. This week’s CES showed that the industry is following Amazon’s lead.

Alexa appeared at the show inside projectors, ceiling lights, cars, glasses, showers, washing machines, earbuds, speakers — and even Windows 10 PCs.

. . . .

Amazon CEO Jeff Bezos this week became the world’s richest person, according to the Forbes list. Over the past few decades, that spot was normally occupied by former Microsoft CEO Bill Gates.

The symbolism is timely; it was at CES this week that Amazon became the new Microsoft.

Microsoft rose to dominance by controlling the operating system that the majority of people and businesses used.

Amazon is now doing something similar with Alexa. While Alexa isn’t even close to becoming as important as Windows, it is becoming the operating system of the post-PC, post-smartphone future.

The reason is very simple, and perfectly described by Sam Dolnick, who oversees digital initiatives at The New York Times. He said: “We are living in a world where the mobile phone is dominant, and audio, which doesn’t require your eyes or your hands, is the ultimate mobile medium.”

Link to the rest at TechConnect

Evaluation of Speech for the Google Assistant

From the Google Research Blog:

Voice interactions with technology are becoming a key part of our lives — from asking your phone for traffic conditions to work to using a smart device at home to turn on the lights or play music. The Google Assistant is designed to provide help and information across a variety of platforms, and is built to bring together a number of products — including Google Maps, Search, Google Photos, third party services, and more. For some of these products, we have released specific evaluation guidelines, like Search Quality Rating Guidelines. However, the Google Assistant needs its own guidelines in place, as many of its interactions utilize what is called “eyes-free technology,” when there is no screen as part of the experience.

In the past we have received requests to see our evaluation guidelines from academics who are researching improvements in voice interactions, question answering and voice-guided exploration. To facilitate their evaluations, we are publishing some of the first Google Assistant guidelines. It is our hope that making these guidelines public will help the research community build and evaluate their own systems.

Creating the Guidelines
For many queries, responses are presented on the display (like a phone) with a graph, a table, or an interactive element, like you’d see for [weather this weekend].

But spoken responses are very different from display results, as what’s on screen needs to be translated into useful speech. Furthermore, the contents of the voice response are sometimes sourced from the web, and in those cases it’s important to provide the user with a link to the original source. While users looking at their mobile device can click through to read the original web page, an eyes free solution presents unique challenges. In order to generate the optimal audio response, we use a combination of explicit linguistic knowledge and deep learning solutions that allow us to keep answers grammatical, fluent and concise.

. . . .

Formulation: it is much easier to understand a badly formulated written answer than an ungrammatical spoken answer, so more care has to be placed in ensuring grammatical correctness.

Link to the rest at the Google Research Blog

 

Google Offers Hand to News Publishers

From The Wall Street Journal:

Google is rolling out a package of new policies and services to help news publishers increase subscriptions, a move likely to warm its icy relationship with some of the biggest critics of its power over the internet.

Google said it will end this week its decade-old “first click free” policy that required news websites to give readers free access to articles from Google’s search results. The policy upset publishers that require subscriptions, believing it undercut their efforts to get readers to pay for news.

Google, a unit of Alphabet Inc., said it also plans tools to help increase subscriptions, including enabling users to log in with their Google passwords to simplify the subscription process and sharing user data with news organizations to better target potential subscribers.

With billions of people using its search, YouTube and other web properties, Google has an outsize influence on a wealth of industries and modern society.

. . . .

The new publisher rules are good news for the print industry, which has largely struggled to convert its business model to the internet as print advertising sales have plummeted in the digital age. Google and Facebook dominate the internet ad industry, and news organizations are increasingly reliant on those two tech giants for web traffic. Google says it drives 10 billion clicks a month to publishers’ sites.

Some newspapers even asked Congress this year to exempt them from antitrust laws so they could negotiate collectively with the tech giants.

. . . .

“We really recognize the transition to digital for publishers hasn’t been easy,” Google Chief Business Officer Philipp Schindler said in an interview. He said a strong news industry boosts the utility of Google search and helps Google’s ad business, which sells ads on news sites. “The economics are pretty clear: If publishers aren’t successful, we can’t be successful.”

. . . .

Kinsey Wilson, the former executive editor of USA Today who now advises New York Times Co. , said publishers must be careful about letting Google be the middleman to its readers. “Google can remove some friction,” he said, “but publishers have to stay vigilant.”

Link to the rest at The Wall Street Journal

Amazon Seeks to Defend Alexa’s Lead as Competition Heats Up

From The Wall Street Journal:

Amazon.com Inc.  is pouring more resources into Alexa to maintain its edge as competition heats up among artificial-intelligence assistants, according to people familiar with the company’s thinking.

Amazon is adding hundreds of engineers to the Alexa program and giving it hiring preference over other divisions, the people said. It has also put Tom Taylor, a veteran Amazon executive known for scaling high-growth operations, in charge of the business, after the former Alexa chief retired.

Alexa powers Amazon’s Echo speaker device, which was the first of its kind when it was launched nearly three years ago. The Echo has about three-quarters of the U.S. market for smart speakers, with more than 11 million total devices sold through the end of last year, according to analyst estimates.

. . . .

On Wednesday, Amazon said it was teaming up with Microsoft to allow their voice-enabled digital assistants to work together, a move analysts said would help boost Alexa’s content. But the two won’t be sharing data.

Amazon draws its user data primarily from its retail website, which gives it a big advantage when it comes to shopping and other related machine learning, and its Echo devices.

Google gathers data from its widely used search engine in addition to its Android operating system for smartphones. It is directing its voice assistant at Amazon’s stronghold, e-commerce, with the partnership it announced with Wal-Mart Stores Inc. last week. Users of its Google Express shopping service will be able to order from the retail giant by voice via Google’s virtual assistant.

. . . .

Amazon also integrated Alexa with outside developers and products, adding more than 15,000 skills that allow consumers to ask Alexa to digitally shake a Magic 8-ball or turn on the lights. Alexa has also been added to devices ranging from Ford Motor Co. cars to Sears Holdings Corp.’s Kenmore refrigerators, which Google is also trying to do.

Still, Amazon needs Alexa to keep getting smarter. Josh Vickerson picked up a Dot late last year when it was on sale for $35. The 24-year-old originally tried out features like playing Jeopardy and integrating his Fitbit . Now he uses it to play videos, set timers and turn off the lights.

Link to the rest at The Wall Street Journal

Casa PG obtained its first Echo (which is currently marked down to $99) not long after it was first released. Since then, a flock of little Dots (also marked down) have shown up.

PG doesn’t claim to be a power user, but does have enough Dots that one is always within easy reach of his voice. They’re turning out to be very helpful and PG uses them for far fewer services than they can provide.

PG understands that his Echo and Dots will soon be able to control Casa PG’s Sonos speakers (very highly recommended). That will be wonderful. PG probably needs to buy a couple more Dots so whenever he whispers a command, Sonos starts playing.

Are the tech giants too big to be good partners for book publishing?

From veteran publishing consultant Mike Shatzkin:

An online discussion forum that includes publishers and librarians and tech people usually sends me several emails a day. About 10 days ago, a conversation evolved about Google Book Search and the Google Library Project, two initiatives by the search giant that were initiated in the early part of the last decade.

Because both programs essentially gave Google a trove of book-published content for full text search, there was a wariness among the publishing community about them when they started. In time, publishers (through the AAP) sued Google and the course of the lawsuit ultimately led to a sharp curtailment of Google’s ability to just do the scanning. After a while, it appears the reservoir of interest at Google for the project, which started as more of a “service to humanity” idea than a profitable one, just evaporated. The scans that Google had already done became part of the HathiTrust repository of content, an important research and scholarship tool in the non-trade world without any recognition or impact on the trade world at all.

. . . .

And, of course, Google is the single most powerful source of “discovery” and many in publishing wonder if books overall would have benefited from Google being more “knowledgeable” about what is inside of them.

So, to this day, years after the litigation and the scanning program have concluded, there is a division of opinion in the publishing community. Some see Google as a bully and a villain, trying to make its own rules to benefit from publishers’ content and crippling the value of copyright. Others focus on the lost opportunity and believe publishers would actually have more valuable intellectual property (more valuable copyrights!) today if they’d just allowed the Google programs to develop and flourish.

. . . .

In the course of the discussion, a very knowledgeable and experienced veteran of publishing across education, professional, and trade offered the comment that “Google is a terrible partner.” I asked him (offline from the group discussion; he’s a friend) to amplify that.

My points of context for Google weren’t in publishing; they were in tech. My own most extensive experiences with the big three tech companies that publishers dealt with — Amazon, Apple, and Google — was working out their participation at publishing conferences.

. . . .

What I saw was that Apple was the most uptight; it was hard to get speakers because messaging was so tightly controlled by upper management.

Amazon would sometimes be very agreeable, but primarily when they had an agenda: some program they wanted to get across or some point they wanted to make. So they were often cooperative, but very much on their terms to put across their message du jour. In general, they wouldn’t do panels or Q&As. They needed to control the conversation and skillfully avoided being pushed to publicly discuss anything they didn’t want to talk about. But they were often available and always interesting, and unlike Apple (in my experience), would engage with you honestly about their agenda.

. . . .

Google was, in my experience, by far the most open and accessible of the three companies. You could tell them you wanted speakers or panelists to cover one subject or another and you’d get directed to people who could help you. And Google employed a pretty fair number of ex-publishing people who were conversant about issues from a perspective that publishers could relate to.

. . . .

What my friend said in response to my inquiry, in which I had only mentioned Google, was, “Google, Apple, and Amazon are all bad partners. Ingram, Baker & Taylor, and Firebrand are good partners.”

So much for my contextual frame.

But grouping the three to me made the point that my context was what mattered. Ingram, Baker & Taylor, and Firebrand all make their living in the book business. Google, Apple, and Amazon have a financial stake in the book business that amounts to a small rounding error to their overall financial performance.

. . . .

For the entire life of the book business until about fifteen minutes ago, it was very much a free-standing industry. The only larger-than-the-industry enterprises it had to deal with were the Post Office and United Parcel Service. Our authors, designers, typesetters, printers, and, most important of all, customers to which we shipped directly (the wholesalers and retailers and libraries) were part of the publishers’ world. They depended on the publishers as much as the publishers depended on them.

Amazon was the first piece of evidence — and still the most important piece of evidence — that the old world has disappeared.  . . . . They sell more than half of the books for most publishers, but all the books they sell probably amount to less than 5 percent of their total margin. And while Penguin Random House may be in the neighborhood of half the consumer book sales overall, they wouldn’t amount to nearly that big a percentage of Amazon’s book sales because Amazon gets a disproportionate share of professional and other niche markets and thus from publishers who don’t compete at all with PRH in the consumer market.

And because Amazon has very intentionally created a whole massive pool of consumer books that nobody else has, through their own publishing and enabling independent authors.

Link to the rest at The Shatzkin Files

PG has had direct business/legal dealings and negotiations with Apple and Amazon over the last 15 years or so. For context, he has also had business negotiations with Microsoft, Oracle, Hewlett-Packard and Intel in the tech world plus every major investment bank in New York (Goldman Sachs, Morgan Stanley, etc., etc.), most of the large accounting firms plus Disney, American Express and a bunch of other big companies.

To be clear, this doesn’t mean PG knows everything about negotiating intellectual property partnerships and other deals with large organizations, but he does know some things about that subject.

PG definitely has not represented any large publishers in their dealings with large tech companies. He has, however, represented a lot of authors in their dealings with large publishers.

Speaking generally, large publishers are not cut out to be good partners for tech companies.

Publishers are simply too rigid in their business vision and very much focused on the short term (which is strange for organizations that license copyrights, which extend far into the future).

This short term outlook is substantially affected by the fact that the Big Five publishers are all owned and controlled by other and larger media conglomerates. Four of the Big Five are owned by large European publishing corporations that are not known for their commitment to innovation and could not be described as tech-savvy in any sense. The fifth Big Five publisher, Simon & Schuster, is owned by CBS.

Each of these media conglomerates is heavily focused on this quarter’s and this year’s income, expenses and profits. They’re not what anyone would call forward-looking or focused on the long term. If they think about the long term at all, they’re convinced it will not be much different than last quarter.

(PG worked for a major subsidiary of a very, very large international media conglomerate for three unhappy years and knows that of which he speaks.)

This means that if Google sends someone to talk to the President of a Big Five publisher, Google is talking to a middle-manager in a much larger business organization. The Big Five President can do pretty much whatever he/she wants to do with Barnes & Noble and Ingram (as long as it doesn’t have an adverse impact on profits), but cutting a strategic deal with Google is way, way out of his/her job description.

Organizations like Google, Apple and Amazon quickly become frustrated with organizations that are not able to move rapidly.

Amazon’s Alexa Has A Data Dilemma: Be More Like Apple Or Google?

From Fast Company:

Devices like Amazon Echo could someday turn into a treasure trove for developers that make voice assistant skills, but first companies have to figure out where they draw the line when it comes to weighing data sharing against consumer privacy.

Now that dilemma is heating up: Citing three unnamed sources, The Information reported this week that Amazon is considering whether to provide full conversation transcripts to Alexa developers. This would be a major change from Amazon’s current policy in which the company only provides basic information—such as the total number of users, the average number of actions they’ve performed, and rates of success or failure for voice commands. Amazon declined to comment to The Information regarding the claims, but the change wouldn’t be unprecedented. Google’s voice assistant platform already provides full transcripts to developers.

The potential move by Amazon underscores how it is caught between two worlds with its Alexa assistant, especially in regards to privacy. By keeping transcripts to itself, Amazon can better protect against the misuse of its customers’ data and avoid concerns about eavesdropping. But because Alexa already gives developers the freedom to build virtually any kind of voice skill, their inability to see what customers are saying becomes a major burden.

. . . .

With Google Assistant, developers can view a transcript for any conversation with their particular skill. Uber, for example, can look at all recorded utterances from the moment you ask for a car until the ride is confirmed. (It can’t, however, see what you’ve said to other apps and services.) Google’s own documentation confirms this, noting that developers can request “keyboard input or spoken input from end user” during a conversation.

For developers, this data can be of immense utility. It allows them to find out if users are commonly speaking in the wrong syntax, or asking to do things that the developer’s voice skill doesn’t support.

. . . .

In terms of sharing data with developers, Apple’s Siri voice assistant is on the opposite side of the spectrum from Google. Developers who work with SiriKit get no information about usage from Apple, not even for basic things like how many people use voice commands to access an app, or which voice commands are most commonly used.

. . . .

But keep in mind that Siri’s approach to third-party development is entirely different from that of Google and Amazon. Instead of letting developers build any kind of voice application, Apple only supports third-party voice commands in a handful of specific domains, such as photo search, workouts, ride hailing, and messaging. And instead of letting those apps drive the conversation, Apple controls the back-and-forth itself. The apps merely provide the data and some optional on-screen information.

Because these apps don’t communicate with users directly, there’s no need for them to have conversation transcripts in the first place. Instead, Apple can look at what users are trying to accomplish and use that data to expand Siri on its own.

The downside to this approach is that Siri just isn’t as useful as other virtual assistants.

Link to the rest at Fast Company 

If PG lived in China, he would be inclined not to use Alexa.

Apple Is Manufacturing a Siri Speaker to Outdo Google and Amazon

From Bloomberg:

 Apple Inc. is already in your pocket, on your desk and underneath your television. Soon, a device embossed with “Designed by Apple in California” may be on your nightstand or kitchen counter as well.

The iPhone-maker has started manufacturing a long-in-the-works Siri-controlled smart speaker, according to people familiar with the matter. Apple could debut the speaker as soon as its annual developer conference in June, but the device will not be ready to ship until later in the year, the people said.

The device will differ from Amazon.com Inc.’s Echo and Alphabet Inc.’s Google Homespeakers by offering virtual surround sound technology and deep integration with Apple’s product lineup, said the people, who requested anonymity to discuss products that aren’t yet public.

Introducing a speaker would serve two main purposes: providing a hub to automate appliances and lights via Apple’s HomeKit system, and establishing a bulwark inside the home to lock customers more tightly into Apple’s network of services. That would help combat the competitive threat from Google’s and Amazon’s connected speakers: the Home and Echo mostly don’t support services from Apple. Without compatible hardware, users may be more likely to opt for the Echo or Home, and therefore use streaming music offerings such as Spotify, Amazon Prime Music or Google Play rather than Apple Music.

Link to the rest at Bloomberg

PG says competition keeps competitors sharp and is great for consumers.

Tech’s Frightful Five: They’ve Got Us

From The New York Times:

A few weeks ago, I bought a new television. When the whole process was over, I realized something incredible: To navigate all of the niggling details surrounding this one commercial transaction — figuring out what to buy, which accessories I needed, how and where to install it, and whom to hire to do so — I had dealt with only a single ubiquitous corporation: Amazon.

It wasn’t just the TV. As I began combing through other recent household decisions, I found that in 2016, nearly 10 percent of my household’s commercial transactions flowed through the Seattle retailer, more by far than any other company my family dealt with. What’s more, with its Echos, Fire TV devices, audiobooks, movies and TV shows, Amazon has become, for my family, more than a mere store. It is my confessor, my keeper of lists, a provider of food and culture, an entertainer and educator and handmaiden to my children.

. . . .

This is the most glaring and underappreciated fact of internet-age capitalism: We are, all of us, in inescapable thrall to one of the handful of American technology companies that now dominate much of the global economy. I speak, of course, of my old friends the Frightful Five: Amazon, Apple, Facebook, Microsoft and Alphabet, the parent company of Google.

The five are among the most valuable companies on the planet, collectively worth trillions.

. . . .

 [L]ast week I came up with a fun game: If an evil, tech-phobic monarch forced you to abandon each of the Frightful Five, in which order would you do so, and how much would your life deteriorate as a result?

. . . .

When I went through the thought experiment, I found that dropping the first couple of tech giants was pretty easy — but after that the process became progressively more unbearable. For me, Facebook was the first to go. I tend to socialize online using Twitter, Apple’s messaging system, and Slack, the office-chat app, so losing Mark Zuckerberg’s popular service (and its subsidiaries, Instagram, WhatsApp and Messenger) was not such a big deal.

Next, for me, was Microsoft, which I found slightly more difficult to quit. I don’t normally use any Windows devices, but Microsoft’s word-processing program, Word, is an essential tool for me, and I’d hate to lose it.

In third place, full of regrets: Apple. There’s nothing I use more than my iPhone, and close behind are my MacBook and iMac 5K, which may be the best computer I’ve ever owned. Abandoning Apple would prompt deep and truly annoying rearrangements in my life, including braving Samsung’s bad software. But I could do it, grudgingly.

Link to the rest at The New York Times

Torching the Modern-Day Library of Alexandria

From The Atlantic:

You were going to get one-click access to the full text of nearly every book that’s ever been published. Books still in print you’d have to pay for, but everything else—a collection slated to grow larger than the holdings at the Library of Congress, Harvard, the University of Michigan, at any of the great national libraries of Europe—would have been available for free at terminals that were going to be placed in every local library that wanted one.

At the terminal you were going to be able to search tens of millions of books and read every page of any book you found. You’d be able to highlight passages and make annotations and share them; for the first time, you’d be able to pinpoint an idea somewhere inside the vastness of the printed record, and send somebody straight to it with a link. Books would become as instantly available, searchable, copy-pasteable—as alive in the digital world—as web pages.

It was to be the realization of a long-held dream. “The universal library has been talked about for millennia,” Richard Ovenden, the head of Oxford’s Bodleian Libraries, has said. “It was possible to think in the Renaissance that you might be able to amass the whole of published knowledge in a single room or a single institution.” In the spring of 2011, it seemed we’d amassed it in a terminal small enough to fit on a desk.

“This is a watershed event and can serve as a catalyst for the reinvention of education, research, and intellectual life,” one eager observer wrote at the time.On March 22 of that year, however, the legal agreement that would have unlocked a century’s worth of books and peppered the country with access terminals to a universal library was rejected under Rule 23(e)(2) of the Federal Rules of Civil Procedure by the U.S. District Court for the Southern District of New York.When the library at Alexandria burned it was said to be an “international catastrophe.” When the most significant humanities project of our time was dismantled in court, the scholars, archivists, and librarians who’d had a hand in its undoing breathed a sigh of relief, for they believed, at the time, that they had narrowly averted disaster.

. . . .

Google’s secret effort to scan every book in the world, codenamed “Project Ocean,” began in earnest in 2002 when Larry Page and Marissa Mayer sat down in the office together with a 300-page book and a metronome. Page wanted to know how long it would take to scan more than a hundred-million books, so he started with one that was lying around. Using the metronome to keep a steady pace, he and Mayer paged through the book cover-to-cover. It took them 40 minutes.

Page had always wanted to digitize books. Way back in 1996, the student project that eventually became Google—a “crawler” that would ingest documents and rank them for relevance against a user’s query—was actually conceived as part of an effort “to develop the enabling technologies for a single, integrated and universal digital library.” The idea was that in the future, once all books were digitized, you’d be able to map the citations among them, see which books got cited the most, and use that data to give better search results to library patrons. But books still lived mostly on paper. Page and his research partner, Sergey Brin, developed their popularity-contest-by-citation idea using pages from the World Wide Web.
By 2002, it seemed to Page like the time might be ripe to come back to books. With that 40-minute number in mind, he approached the University of Michigan, his alma mater and a world leader in book scanning, to find out what the state of the art in mass digitization looked like. Michigan told Page that at the current pace, digitizing their entire collection—7 million volumes—was going to take about a thousand years. Page, who’d by now given the problem some thought, replied that he thought Google could do it in six.. . . .He offered the library a deal: You let us borrow all your books, he said, and we’ll scan them for you. You’ll end up with a digital copy of every volume in your collection, and Google will end up with access to one of the great untapped troves of data left in the world. Brin put Google’s lust for library books this way: “You have thousands of years of human knowledge, and probably the highest-quality knowledge is captured in books.” What if you could feed all the knowledge that’s locked up on paper to a search engine?

By 2004, Google had started scanning. In just over a decade, after making deals with Michigan, Harvard, Stanford, Oxford, the New York Public Library, and dozens of other library systems, the company, outpacing Page’s prediction, had scanned about 25 million books. It cost them an estimated $400 million. It was a feat not just of technology but of logistics.

. . . .

The stations—which didn’t so much scan as photograph books—had been custom-built by Google from the sheet metal up. Each one could digitize books at a rate of 1,000 pages per hour. The book would lie in a specially designed motorized cradle that would adjust to the spine, locking it in place. Above, there was an array of lights and at least $1,000 worth of optics, including four cameras, two pointed at each half of the book, and a range-finding LIDAR that overlaid a three-dimensional laser grid on the book’s surface to capture the curvature of the paper. The human operator would turn pages by hand—no machine could be as quick and gentle—and fire the cameras by pressing a foot pedal, as though playing at a strange piano.

What made the system so efficient is that it left so much of the work to software. Rather than make sure that each page was aligned perfectly, and flattened, before taking a photo, which was a major source of delays in traditional book-scanning systems, cruder images of curved pages were fed to de-warping algorithms, which used the LIDAR data along with some clever mathematics to artificially bend the text back into straight lines.

. . . .

In August 2010, Google put out a blog post announcing that there were 129,864,880 books in the world. The company said they were going to scan them all.

Of course, it didn’t quite turn out that way. This particular moonshot fell about a hundred-million books short of the moon. What happened was complicated but how it started was simple: Google did that thing where you ask for forgiveness rather than permission, and forgiveness was not forthcoming. Upon hearing that Google was taking millions of books out of libraries, scanning them, and returning them as if nothing had happened, authors and publishers filed suit against the company, alleging, as the authors put it simply in their initial complaint, “massive copyright infringement.”

. . . .

As Tim Wu pointed out in a 2003 law review article, what usually becomes of these battles—what happened with piano rolls, with records, with radio, and with cable—isn’t that copyright holders squash the new technology. Instead, they cut a deal and start making money from it. Often this takes the form of a “compulsory license” in which, for example, musicians are required to license their work to the piano-roll maker, but in exchange, the piano-roll maker has to pay a fixed fee, say two cents per song, for every roll they produce. Musicians get a new stream of income, and the public gets to hear their favorite songs on the player piano. “History has shown that time and market forces often provide equilibrium in balancing interests,” Wu writes.

But even if everyone typically ends up ahead, each new cycle starts with rightsholders fearful they’re being displaced by the new technology. When the VCR came out, film executives lashed out. “I say to you that the VCR is to the American film producer and the American public as the Boston strangler is to the woman home alone,” Jack Valenti, then the president of the MPAA, testified before Congress. The major studios sued Sony, arguing that with the VCR, the company was trying to build an entire business on intellectual property theft. But Sony Corp. of America v. Universal City Studios, Inc. became famous for its holding that as long as a copying device was capable of “substantial noninfringing uses”—like someone watching home movies—its makers couldn’t be held liable for copyright infringement.

The Sony case forced the movie industry to accept the existence of VCRs. Not long after, they began to see the device as an opportunity. “The VCR turned out to be one of the most lucrative inventions—for movie producers as well as hardware manufacturers—since movie projectors,” one commentator put it in 2000.
It only took a couple of years for the authors and publishers who sued Google to realize that there was enough middle ground to make everyone happy. This was especially true when you focused on the back catalog, on out-of-print works, instead of books still on store shelves. Once you made that distinction, it was possible to see the whole project in a different light. Maybe Google wasn’t plundering anyone’s work. Maybe they were giving it a new life. Google Books could turn out to be for out-of-print books what the VCR had been for movies out of the theater.If that was true, you wouldn’t actually want to stop Google from scanning out-of-print books—you’d want to encourage it. In fact, you’d want them to go beyond just showing snippets to actually selling those books as digital downloads.. . . .

Those who had been at the table crafting the agreement had expected some resistance, but not the “parade of horribles,” as Sarnoff described it, that they eventually saw. The objections came in many flavors, but they all started with the sense that the settlement was handing to Google, and Google alone, an awesome power. “Did we want the greatest library that would ever exist to be in the hands of one giant corporation, which could really charge almost anything it wanted for access to it?”, Robert Darnton, then president of Harvard’s library, has said.

Darnton had initially been supportive of Google’s scanning project, but the settlement made him wary. The scenario he and many others feared was that the same thing that had happened to the academic journal market would happen to the Google Books database. The price would be fair at first, but once libraries and universities became dependent on the subscription, the price would rise and rise until it began to rival the usurious rates that journals were charging, where for instance by 2011 a yearly subscription to the Journal of Comparative Neurology could cost as much as $25,910.Although academics and library enthusiasts like Darnton were thrilled by the prospect of opening up out-of-print books, they saw the settlement as a kind of deal with the devil. Yes, it would create the greatest library there’s ever been—but at the expense of creating perhaps the largest bookstore, too, run by what they saw as a powerful monopolist. In their view, there had to be a better way to unlock all those books. “Indeed, most elements of the GBS settlement would seem to be in the public interest, except for the fact that the settlement restricts the benefits of the deal to Google,” the Berkeley law professor Pamela Samuelson wrote.

Link to the rest at The Atlantic and thanks to Valerie for the tip.

How Google Book Search Got Lost

From Backchannel:

Books can do anything. As Franz Kafka once said, “A book must be the axe for the frozen sea inside us.”

It was Kafka, wasn’t it? Google confirms this. But where did he say it? Google offers links to some quotation websites, but they’re generally unreliable. (They misattribute everything, usually to Mark Twain.)

To answer such questions, you need Google Book Search, the tool that magically scours the texts of millions of digitized volumes. Just find the little “more” tab at the top of the Google results page — it’s right past Images, Videos, and News. Then click on it, find “Books,” and click on that.

. . . .

It turns out that the “frozen sea” quote is from Kafka’s Letters to Friends, Family, and Editors, in a missive to Oskar Pollak, dated January 27, 1904.

. . . .

Google Book Search is amazing that way. When it started almost 15 years ago, it also seemed impossibly ambitious: An upstart tech company that had just tamed and organized the vast informational jungle of the web would now extend the reach of its search box into the offline world. By scanning millions of printed books from the libraries with which it partnered, it would import the entire body of pre-internet writing into its database.

“You have thousands of years of human knowledge, and probably the highest-quality knowledge is captured in books,” Google cofounder Sergey Brin told The New Yorker at the time. “So not having that — it’s just too big an omission.”

. . . .

Today, Google is known for its moonshot culture, its willingness to take on gigantic challenges at global scale. Books was, by general agreement of veteran Googlers, the company’s first lunar mission. Scan All The Books!

In its youth, Google Books inspired the world with a vision of a “library of utopia” that would extend online convenience to offline wisdom. At the time it seemed like a singularity for the written word: We’d upload all those pages into the ether, and they would somehow produce a phase-shift in human awareness. Instead, Google Books has settled into a quiet middle age of sourcing quotes and serving up snippets of text from the 25 million-plus tomes in its database.

Google employees maintain that’s all they ever intended to achieve. Maybe so. But they sure got everyone else’s hopes up.

. . . .

When I started work on this story, I feared at first that Books no longer existed as a discrete part of the Google organization — that Google had actually shut the project down. As with many aspects of Google, there’s always been some secrecy around Google Books, but this time, when I started asking questions, it closed up like a startled turtle. For weeks there didn’t seem to be anyone around or available who could or would speak to the current state of the Books effort.

The Google Books “History” page trails off in 2007, and its blog stopped updating in 2012, after which it got folded into the main Google Search blog, where information about Books is nearly impossible to find. As a functioning and useful service, Google Books remained a going concern. But as a living project, with plans and announcements and institutional visibility, it seemed to have pulled a vanishing act. All of which felt weird, given the legal victory it had finally won.

When I talked to alumni of the project who’d left Google, several mentioned that they suspected the company had stopped scanning books. Eventually, I learned that there are, indeed, still some Googlers working on Book Search, and they’re still adding new books, though at a significantly slower pacethan at the project’s peak around 2010–11.

. . . .

LED lighting, not widely available at the project’s start, has helped. So has studying more efficient techniques for human operators to flip pages. “It’s almost like finger-picking on a guitar,” Jaskiewicz says. “So we find people who have great ways of turning pages — where is the thumb and that kind of stuff.”

. . . .

Like many tech-friendly bibliophiles, Sloan says he uses Google Books a lot, but is sad that it isn’t continuing to evolve and amaze us. “I wish it was a big glittering beautiful useful thing that was growing and getting more interesting all the time,” he says. He also wonders: We know Google can’t legally make its millions of books available for anyone to read in full — but what if it made them available for machines to read?

Machine-learning tools that analyze texts in new ways are advancing quickly today, Sloan notes, and “the culture around it has a real Homebrew Computer Club or early web feel to it right now.” But to progress, researchers need big troves of data to feed their programs.

“If Google could find a way to take that corpus, sliced and diced by genre, topic, time period, all the ways you can divide it, and make that available to machine-learning researchers and hobbyists at universities and out in the wild, I’ll bet there’s some really interesting work that could come out of that. Nobody knows what,” Sloan says. He assumes Google is already doing this internally. Jaskiewicz and others at Google would not say.

Link to the rest at Backchannel