On Monday morning, numerous writers woke up to learn that their books had been uploaded and scanned into a massive dataset without their consent. A project of cloud word processor Shaxpir, Prosecraft compiled over 27,000 books, comparing, ranking and analyzing them based on the “vividness” of their language. Many authors — including Young Adult powerhouse Maureen Johnson and “Little Fires Everywhere” author Celeste Ng — spoke out against Prosecraft for training a model on their books without consent. Even books published less than a month ago had already been uploaded.
After a day full of righteous online backlash, Prosecraft creator Benji Smith took down the website, which had existed since 2017.
“I’ve spent thousands of hours working on this project, cleaning up and annotating text, organizing and tweaking things,” Smith wrote. “But in the meantime, ‘AI’ became a thing. And the arrival of AI on the scene has been tainted by early use-cases that allow anyone to create zero-effort impersonations of artists, cutting those creators out of their own creative process.”
. . . .
Smith’s Prosecraft was not a generative AI tool, but authors worried it could become one, since he had amassed a dataset of a quarter billion words from published books, which he found by crawling the internet.
Prosecraft would show two paragraphs from a book, one that was “most passive” and one that was “most vivid.” It then placed the books into percentile rankings based on how vivid, how long or how passive it was.
. . . .
“If you’re a writer as a career it’s maddening, in part because style is not the same as writing a ****** whitepaper for a business that needs to be in active voice or whatever,” author Ilana Masad said. “Style is style!”
. . . .
“Since I was only publishing summary statistics, and small snippets from the text of those books, I believed I was honoring the spirit of the Fair Use doctrine, which doesn’t require the consent of the original author,” Smith wrote. Some authors noted that the excerpts of their books on Prosecraft included major spoilers, causing further frustration.
Though Smith apologized, authors remain exasperated. For artists and writers, the recent proliferation of AI tools has created a deeply frustrating game of whack-a-mole. As soon as they opt out of one database, they find that their work has been used to train another AI model, and so on.
“It’s pretty much the norm, from what I can tell, for these sites and projects to do whatever they’re doing first and then hope that no one notices and then disappear or get defensive when they inevitably do,” Masad said.
Generative AI and the technology behind self-publishing have created a perfect storm for scammy activities. Amazon has been flooded with low-quality, AI-generated travel guides, and even AI-generated children’s books. But tools like ChatGPT are basically trained on the sum total of the internet, so this means that real travel writers or children’s books authors could be getting inadvertently plagiarized.
Author Jane Friedman wrote in a recent blog post — titled “I’d Rather See My Books Get Pirated Than This” — that she is being impersonated on Amazon, where someone is selling books under her name that appear to be written with an AI.
Though Friedman was successful in getting these fake books removed from her Goodreads page, but Amazon initially told her it wouldn’t remove the books for sale unless she had a trademark for her name.
After Friedman’s post went viral, Amazon removed the misleading books.
Amazon spokesperson Ashley Vanicek told TechCrunch: “We have clear content guidelines governing which books can be listed for sale and promptly investigate any book when a concern is raised. We welcome author feedback and work directly with authors to address any issues they raise and where we have made an error, we correct it. We invest heavily to provide a trustworthy shopping experience and protect customers and authors from misuse of our service.”
Though both Prosecraft and Amazon ended up capitulating to writers’ requests, there are many other cases in which writers’ concerns aren’t heard — disputes over the use of AI are one reason why Hollywood writers are currently striking, for example.
Link to the rest at TechCrunch
PG notes that the OP was published on August 7, 2023.
In the intervening time period of almost two months, AI has moved so quickly that the complaints and responses seem somewhat out of date.
PG posits that there will still be a great many human writers using their brains AKA Wetware to create interesting and commercially successful books.
He also posits that there will certainly be litigation regarding the way creators of AI programs use books, magazines, anything with words in it, to provide grist for the AI’s mill.
PG predicts there will be some very good and insightful court decisions and some unusual and strange court decisions. 90% of the judges sitting on the bench in federal and state court systems are pretty weak when it comes to complex technology, and few technologies are more complex than AI.
PG predicts that AI is just getting warmed up, and the capabilities and complexity of future AI programs will make many current AI systems look pretty elementary and crude by comparison.
PG doesn’t do litigation anymore, but if he was still litigating, and if he were representing someone in an AI intellectual property lawsuit, he would think long and hard about a variety of striking analogies that apply to AI.
His reason for doing so is that a compelling (and accurate) analogy or group of analogies will be quite important in laying out the facts and arguments of the case in the judge’s mind.
If he persuaded the judge that his analogies were accurate, then applying relevant laws and prior court opinions become much easier and more persuasive.
Spinning off the top of his head, PG might ask himself whether an AI program is more like a giant library or more like a very intelligent individual who has read all the books in the library and is using that information as a foundation for generating new and previously unconsidered or undiscovered ideas and expressions of ideas.
PG suggests that “inspired by” is an easily understood and well-established way that new creations have been made for centuries.
Here are the first two stanzas of the poem titled, The Truly Great, by Stephen Spender:
I think continually of those who were truly great.
Who, from the womb, remembered the soul’s history
Through corridors of light, where the hours are suns,
Endless and singing. Whose lovely ambition
Was that their lips, still touched with fire,
Should tell of the Spirit, clothed from head to foot in song.
And who hoarded from the Spring branches
The desires falling across their bodies like blossoms.
What is precious, is never to forget
The essential delight of the blood drawn from ageless springs
Breaking through rocks in worlds before our earth.
Never to deny its pleasure in the morning simple light
Nor its grave evening demand for love.
Never to allow gradually the traffic to smother
With noise and fog, the flowering of the spirit.
Here’s an excerpt from another poem that illustrates the “inspired by” meme, Ode on a Grecian Urn, by John Keats. The narrator (or perhaps Keats) is carefully examining a beautiful ancient Greek artwork – an urn on display somewhere, perhaps in a museum. The poem is a record of some of the thoughts and feelings that come to the observer’s mind and heart during this examination.
Thou still unravish’d bride of quietness,
Thou foster-child of silence and slow time,
Sylvan historian, who canst thus express
A flowery tale more sweetly than our rhyme:
What leaf-fring’d legend haunts about thy shape
Of deities or mortals, or of both,
In Tempe or the dales of Arcady?
What men or gods are these? What maidens loth?
What mad pursuit? What struggle to escape?
What pipes and timbrels? What wild ecstasy?
Heard melodies are sweet, but those unheard
Are sweeter; therefore, ye soft pipes, play on;
Not to the sensual ear, but, more endear’d,
Pipe to the spirit ditties of no tone:
Fair youth, beneath the trees, thou canst not leave
Thy song, nor ever can those trees be bare;
Bold Lover, never, never canst thou kiss,
Though winning near the goal yet, do not grieve;
She cannot fade, though thou hast not thy bliss,
For ever wilt thou love, and she be fair!
Ah, happy, happy boughs! that cannot shed
Your leaves, nor ever bid the Spring adieu;
And, happy melodist, unwearied,
For ever piping songs for ever new;
More happy love! more happy, happy love!
For ever warm and still to be enjoy’d,
For ever panting, and for ever young;
All breathing human passion far above,
That leaves a heart high-sorrowful and cloy’d,
A burning forehead, and a parching tongue.