Plagiarism – 2020

PG has been looking into contemporary plagiarism over the past several days and will be writing more than one post about the topic.

The problem is three-fold (or maybe more than three-fold. PG has learned about three elements):

1. When Amazon and others permit an author or plagiarist to self-publish books around the world in a large number of languages. How does an author even discover that plagiarism is taking place?

2. College and university professors (and some high school teachers) are increasingly likely to screen student papers use plagiarism detection software – Turnitin is one of the most popular tools. Some time ago, students learned that copying and pasting a paper or segments of various papers they found online was an easy shortcut to creating a paper to turn in by a class deadline. Sometimes the online sources even included footnotes formatted in proper academic form. Plagiarism detection software is designed to catch such activities.

3. Where there are electronic plagiarism weapons, almost inevitably, there will be electronic or other defenses that prevent detection of plagiarism – paraphrasing the plagiarized information is one tactic that has been used since well before Turnitin came into being. For further information, see, for example, How to Beat Turnitin in 2019 and Get Away with It

4. While many of the ways of beating academically-oriented plagiarism detection are focused on manipulating a student paper, other, more sophisticated computerized tools often referred to as “Spinners” or “Article Spinners” can be used to not only fool college plagiarism checkers, but also make it difficult for the author of a book to discover plagiarism and prove copyright infringement in court.

Article Spinners were developed for a period prior to Google’s search engines developing the intelligence they have today.

The goal for some search engine optimizers was to generate as many pages with key words of interest to Google and, thus, advertisers. The spinners were created to substitute various synonyms for parts of an article on a topic. Thus, “good” in the original article would be changed to “great” “super” “excellent”, etc., etc. Several different words would be spin-treated. Thus, one four paragraph article on fishing lures could be spun into a thousand articles about fishing lures, each seeming to be a different page to Google. If someone was searching for fishing lures, Google would rank the site with a thousand articles about fishing lures higher than a site with one article.

Google has become smarter, so spinning doesn’t work there any more, but spinning software is still around and has reportedly become more sophisticated. Pour the text of a romance ebook into spinning software and out comes another romance that has a similar plot but different character names, places, descriptions, etc.

PG understands that the products of current spinning software require a significant amount of editing, but, if you’re planning to sell an 80,000 word romance, it’s a lot less work to do a quick copy edit than to write a book, develop characters, etc., from scratch.

5. Artificial Intelligence software has become more and more sophisticated in the past couple of years and no one expects progress to stop. And it is currently being used to write stories. Bloomberg generates about half of its articles about public companies and their latest earnings releases using artificial intelligence.

From Forbes magazine in February, 2019:

How do you know I am really a human writing this article and not a robot?  Several major publications are picking up machine learning tools for content. So, what does artificial intelligence mean for the future of journalists?

According to Matt Carlson, author of “The Robotic Reporter”, the algorithm converts data into narrative news text in real-time.

Many of these being financially focused news stories since the data is calculated and released frequently. Which is why should be no surprise that Bloomberg news is one of the first adaptors of this automated content. Their program, Cyborg, churned out thousands of articles last year that took financial reports and turned them into news stories like a business reporter.

. . . .

Forbes also uses an AI called Bertie to assist in providing reporters with first drafts and templates for news stories.

The Washington Post also has a robot reporting program called Heliograf. In its first year, it produced approximately 850 articles and earned The Post an award for its “Excellence in Use of Bots” from its work on the 2016 election coverage.

. . . .

The LA Times is using AI to report on earthquakes based on data from the U.S. geological survey and also tracks homicide information on every homicide committed in the city of Los Angeles. The site created by the machine called “Homicide Report” utilizes a robot-reporter with the ability to write drafts of stories that include that includes: the victim’s gender and race, cause of death, officer involvement, neighborhood and year of death.

. . . .

The AP estimates that AI helps to free up about 20 percent of reporters’ time spent covering financial earnings for companies and can provide better accuracy. This gives reporters more time to concentrate on the content and story-telling behind an article rather than the fact-checking and research.

Link to the rest at Forbes

Contemporary artificial intelligence is leagues beyond article spinners and detecting that the work of another author (or several other authors) as the source material for an AI writing romance or other types of book-length fiction or non-fiction may already be difficult or next to impossible.

PG is interested in this issue as it relates to copyright infringement in the 21st century and will have a few more posts

Where There is Creativity, There is Plagiarism

From Plagiarism Today:

Plagiarism can often seem invisible.

Not only do plagiarists often go to great lengths to hide the activities but, even when it’s done in broad daylight, those that aren’t actively looking for it will usually miss it. It’s very easy to look around you and feel confident that you’re in a relative plagiarism-free zone.

But the truth is much different. Plagiarism is literally everywhere that there is creativity. It doesn’t matter what kind of work you create or how it’s created, if there is originally and expression, you’re likely to find plagiarism.

In the more than 15 years I’ve been running Plagiarism Today, we’ve discussed plagiarism in a wide variety of environments including knitting, board games, video games, flag design, API development, YouTube videos (not counting other copyright issues), poetry, podcasts, comic books, architecture, marketing and much, much more.

If there’s creativity in an industry, there’s a near-guarantee that there is plagiarism in it. That’s because, whenever there’s a barrier to creating something, whether it’s an essay or movie, you can rest assured someone will be there to take whatever shortcuts they can to create their own.

. . . .

As Jason Chu of Turnitin once said, “Plagiarism is about putting outcomes ahead of processes.”

In short, plagiarist is someone who wants the outcome of having created something but doesn’t value or respect the process of creating that thing. In a simple example, a student who wants an A on a paper but doesn’t want to go through the trouble of actually writing such a paper may be tempted to plagiarize it.

To be clear, not every person that feels this way will be a plagiarist. Many students may not care about or see the value in writing an essay, but most will grit their teeth and do the work, either out of a sense of honesty or a fear of reprisal.

However, a student that values or even enjoys the process of writing an essay or completing an assignment will be much less tempted to plagiarize, regardless of their sense of honesty or how much they fear getting caught.

. . . .

But to creatives, this can seem alien. Why would you want to create something and not have it be original? Why would you want to put your name on something that someone else already made?

The reason is that we, as a society, value creators. Though, not always enough to avoid pirating their work, there is still a cult of celebrity placed around authorship and creativity. Whether it’s authors, filmmakers, musicians, artists, photographers or any other type of work, there’s a lot of appeal to being a creator.

That societal value is only matched by some people’s individual willingness to take shortcuts. In short, being a creative is very appealing, especially in the digital age when just about anyone can find an audience, but being creative requires a great deal of hard work and there are many that find that too high of a cost.

. . . .

[One step toward the deterence of plagiarism] is to be honest about what it takes to create a work.

For example, this post, and ones like, do not spring fully formed from my mind. They often take hours or works, sometimes broken up over multiple days. Even for all of my typos and grammar mistakes, there is a great deal of editing, revision and preparation that goes into them as well.

However, that’s not something that people see. We have created a mythos around creativity where a great work is the product of a brilliant mind, not the toil of countless hours of hard, often dull, labor.

Creativity is not something that’s available on demand and it rarely bears any fruit of worth without being combined with hard work. However, we don’t talk about those elements and that sets up an unrealistic expectations for those who have never done it themselves.

How can we expect others to respect the process of creating something when we aren’t always open and honest about that process ourselves?

Our cult of creativity has minimized the work that goes into creating something and put the focus on an intangible spark or a mythical completely self-contained idea that sprang forth fully formed. Neither are true.

Creativity is work and, though more work does not equal better product, if we were more open about how works were actually created, others might feel less justified in skipping the invisible work or copying the elusive creativity.

Link to the rest at Plagiarism Today

A Cautionary Tale of High-Priced Plagiarism

From Plagiarism Today:

Imagine paying $115,000 for a report and not being able to do anything with it due to plagiarism and/or attribution issues. Likewise, imagine being a well-known expert and professor, one that’s routinely cited in the media, and now having one of your best-known stories being a lawsuit over said issues.

That’s exactly the situation that’s befallen a right-leaning advocacy group, the State Government Leadership Foundation, and Stuart N. Brotman, a current professor at the University of Tennessee.

The story involves a massive mess that, after nearly two years in court, finally reached a settlement. Along the way, the case damaged the career of an otherwise-respected expert, the work of an organization and may have hindered the efforts that the State Government Leadership Foundation was undertaking.

The worst part of it is that it all could have easily been avoided. The case winds up being a frustrating lesson not only in the need to avoid plagiarism, but in the need for organizations to think about and prepare for plagiarism, even when dealing with very expensive reports.

. . . .

Today, Stuart Brotman is the University of Tennessee Howard Distinguished Endowed Professor of Media Management and Law. However, in 2015, he was working for his own firm, Brotman Communications.

It was in 2015 that he was first contacted by the State Government Leadership Foundation (SGLF). The SGLF is, by its own description, a conservative non-profit that focuses on helping states implement conservative policies.

Around that time the SGLF took interest in a plan by the FCC attempted to block states from passing laws that barred the creation and expansion of municipal broadband services.

. . . .

Internet providers opposed the FCC on this issue and the SGLF sought the help of Brotman to craft a report on the laws surrounding the issue.

He submitted that report to the SGLF later that year and was paid $115,000 for it. The SGLF then sent it to Dr. George Ford, a researcher that was assigned to examine the economic aspects of the issue. However, when Ford submitted the report to Lawrence Spiwak, another expert on the relevant laws, Spiwak recognized his own words.

Ford then conducted an investigation and showed that he had used passages from a variety of sources without quotations or proper attribution. They forwarded their findings to the SGLF, who ended up not publishing the report.

Instead, they asked him to revise the report, which Brotman said he did, but the SGLF found the finished report was still too-poorly attributed to be used. Brotman couldn’t explain how so much content remained, but said that it could be because, in March 2018, some of his files were “corrupted” in a hack from Iran that targeted thousands of processors.

The SGLF demanded its money back but Brotman proactively sued them, accusing the foundation of sharing his “confidential” report. That element was quickly dismissed but the SGLF countersued in hopes of getting their money back.

. . . .

According to Brotman in the lawsuit, he said he was not asked to produce an original report but was instead asked to provide a “review” of prior research. However, that argument is, to put it mildly, glib.

As both a journalism professor and an expert in the field of law, Brotman likely knew that the SGLF wanted a report that they could use public discourse and that even a review requires proper citations. Simply put, he plagiarized and that appears to be pretty clear from his own testimony at the trial, even if he avoided using the word “plagiarism” on the stand and still denies doing it.

. . . .

No one likes to believe an expert or author they hire would ever plagiarize, but you have to prepare for that possibility and, perhaps, that inevitability.

Having strong anti-plagiarism clauses in your contracts and a process for checking works upon receipt (and before payment) could have headed this off. To be clear, the SGLF is not to blame for being given a plagiarized report, but a better plagiarism plan likely could have averted the embarrassment and the litigation.

When it comes to plagiarism, this case is a reminder that you can’t afford to trust blindly. Despite spending $115,000 for the report, the SGLF didn’t spend a few hundred dollars for an automated analysis or a few thousand for a human one.

Link to the rest at Plagiarism Today

PG says that during primeval times before the explosion of the internet, an author of dubious repute might risk significant plagiarism, particularly if writing an article directed toward a small non-expert audience (patients reading magazines in physicians’ waiting rooms) regarding a topic about which only a handful of individuals were real experts who, therefore, would be uninterested in reading People magazine under any circumstances.

Regarding the OP, if the author of the plagiarised report were a law school professor, he would be accustomed to reading and writing documents filled with footnotes documenting authorities and sources for nearly every paragraph. See, for example, a page from the April, 2019, issue of The Duke Law Journal:


[pdf-embedder url=””]


PG finds the behavior of the professor described in the OP (which may not accurately describe such behavior) to be very strange.

Not being either a law school professor (although he and two other lawyers had a pleasant lunch with a law school professor yesterday in a meeting of a very local bar association) or an expert on the Federal Communications Commission and its various and sundry activities, PG would expect that an advocacy document such as the professor wrote for the plaintiff in the OP would gain some sort increased credibility if it contained footnotes and a copious appendix of sources.

(PG notes in passing that Mrs. Lascelles, one of his marvelous elementary school teachers, would have gently pointed out the run-on characteristics of the prior sentence. No computer grammar checker could ever exceed Mrs. Lascelles at that task.)

(PG also notes that the author described in the OP was not a law school professor at the time he wrote the article, but dense and extensive footnotes are something a law school professor is expected to understand on the first minute of the first day he/she begins work. Additionally, nothing is easier than padding an expert’s report by copying and pasting a list of sources into an appendix.)

(PG also notes that he is in a startlingly verbose mood today and promises to rein in such tendencies for the remainder of today’s posts.)

The Problem with Press Release Plagiarism

From Plagiarism Today:

In the summer of 2011, the subject of press release plagiarism became the center of a journalism ethics debate as Kansas City Star reporter Steve Penn was fired for repeated instances of using press release content without citation.

A year later, just after the Jonah Lehrer scandal broke, Penn sued his former employer for defamation saying that using press releases in such a manner was not plagiarism, but rather, a common practice.

The lawsuit was dismissed by a judge in 2016. The judge found that the allegedly defamatory statements were true and made without actual malice.

However, this doesn’t mean that Penn didn’t have his defenders. The Public Relations Society of America (PRSA), wrote a post defending Penn and saying that, in their view, copying and pasting from press releases without attribution was acceptable because “PRSA views the issuance of a news release as giving implicit consent to re-use and publish the news release’s content.”

That said, the PRSA did add, “Attribution is recommended, for example, when a direct quote is re-used, or facts and figures are cited.”

But recommended does not equal required and PR firms are, in general, very happy to have reporters use their press releases verbatim with or without attribution.

However, to say that such approval makes press release plagiarism OK ignores a fundamental part of why plagiarism is wrong. Sure, the plagiarized may approve of the use, but that doesn’t address the disservice that plagiarism does to the audience.

It’s that issue that makes press release plagiarism a journalistic sin and a practice to be avoided.

. . . .

When it comes to plagiarism, many people look at it solely through the prism of the plagiarized party being wronged and the plagiarist being a “thief” of their work. To that end, if a person offers up their work to be plagiarized, plagiarism appears to be a victimless crime.

However, the plagiarized party is not the only victim. Plagiarism is, at its most fundamental level, is a lie. It’s a person saying that they wrote or created something that they did not. That lie, however, isn’t told to the plagiarized party, but to the audience.

This lie can have many impacts on the audience. It can cause the audience to think more highly of the author if the work is high quality and plagiarism can give more weight and trust in the writing if the author is well-respected.

The latter is the bigger problem for journalists. Journalists, especially at major publications, have a name and status that carries weight. They are meant to be an impartial source that works to represent the facts of a story as accurately as possible, not simply a mouthpiece for the subject of the story.

People inherently mistrust press releases and for good reason. Though most PR professionals are honest and do ethical work, they are definitely trying to present their employers in the most favorable light possible. In short, they are an inherently biased source.

Journalists, however, are supposed to try and divorce themselves of such bias. However, by copying from press releases without proper attribution, they’re not only presenting the words of someone else as their own, but you are not indicating that those words are from the subject of the story and may have a large issue with bias.

Link to the rest at Plagiarism Today

Contract Cheating

From Thomas Lancaster, Academic Integrity Expert:

Contract cheating is a term that we originally publicised in 2006, based around a research study carried out of the use of the RentACoder (now Freelancer) site. The working definition of contract cheating has changed over a series of subsequent studies, talks and publications, but we’d generally classify this loosely along the lines of:

Contract cheating describes the process through which students can have original work produced for them, which they can then submit as if this were their own work. Often this involves the payment of a fee and this can be facilitated using online auction sites.

One of the most striking aspects of the original research into contract cheating has been how cheaply students can have work produced for them. Often, this costs only a few dollars when an agency site is used, using an auction process to help students find people to create assignments for them. This work is often produced far cheaper than traditional essay mills.

. . . .

Since contract cheating produces original work, this is unlikely to be picked up using standard text matching plagiarism detection services such as Turnitin.

Some of the more interesting findings across our research have related to the extent of the use of contract cheating services. Very few students do this as a one off, suggesting that there are students who are continually cheating (and, presumably, getting away with it). There are also outsourcers who have published tens, if not hundreds, of assignments, made up from a variety of different universities and courses. This suggests that a “third party subcontractor” is in operation, likely taking orders from students at a high price and then outsourcing them again themselves at a lower price.

. . . .

There is a lot of potential for further research into contract cheating, in particular trying to establish how and why students cheat. There is also a gap in the knowledge about how to detect this contract cheating. A variety of methods have been proposed, from requiring all assignment specifications to be submitted to a central repository to make them traceable, to using techniques from linguistics to investigate when an assignment has not been written by the student who submitted it.

. . . .

Beyond this, there are parallels with the research into the anti-plagiarism fields, in particular looking at the policies, processes and penalties surrounding contract cheating, and how to write assignments to prevent contract cheating.

Link to the rest at Thomas Lancaster

For those looking for an interest hook about this topic, would you be concerned if you learned your physician procured his/her undergraduate and professional education by paying others to create assigned work? How about the accountant who prepares your tax returns?

There is a website devoted to the challenge of Contract Cheating.

PG was interested to find a discussion of a possible legal approach to sanction contract cheating.

From The International Journal for Educational Integrity via Google Scholar:

The phenomenon of contract cheating presents, potentially, a serious threat to the quality and standards of Higher Education around the world. There have been suggestions, cited below, to tackle the problem using legal means, but we find that current laws are not fit for this purpose. In this article we present a proposal for a specific new law to target contract cheating, which could be enacted in most jurisdictions.

. . . .

Contract cheating, as we define here, is a basic relationship between three actors; a student, their university, and a third party who completes assessments for the former to be submitted to the latter, but whose input is not permitted. ‘Completes’ in this case means that the third party makes a contribution to the work of the student, such that there is reasonable doubt as to whose work the assessment represents.

. . . .

Proposal for a new “offence to provide or advertise cheating services”

(1) A person commits an offence if the person provides any service specified in subsection (4) but in the case of a service being provided in part then a person commits an offence only if the assignment or work could not otherwise be reasonably considered to be that of the student concerned

(2) A person commits an offence if the person advertises any services specified in subsection (4)

(3) A person commits an offence who, without reasonable excuse, publishes an advertisement for any service specified in subsection (4).

(4) The services referred to in subsections (1) to (3) are—

 a. completing in whole or in part an assignment or any other work that a student enrolled at a Higher Education provider is required to complete as part of a Higher Education course in their stead without authorisation from those making the requirement;

 b. providing or arranging the provision of an assignment or any other work (in whole or in part) that a student enrolled at a Higher Education provider is required to complete as part of a Higher Education course in their stead without authorisation from those making the requirement;

(5) A person shall not be guilty of an offence in subsections (1) (2) and (3) above if he or she demonstrates that they did not know and could not with reasonable diligence have ascertained that the services might or would be used for the purposes specified in subsection (4)

(6) Where a body corporate is guilty of an offence under this section and the offence is committed with the consent or connivance of, or to be attributable to neglect on the part of, a director, manager, secretary or other similar officer of the body corporate, or a person who was purporting to act in any such capacity, he or she, as well as the body corporate, is guilty of that offence.

Using ‘strict liability’ removes the need to show intent on behalf of the provider of contract cheating services. The offence could be added to existing legislation (e.g fraud, or education laws) or could stand alone. It would apply to individuals as well as companies; a friend or family member who completes an assignment for a student would be committing an offence. Would apply to examinations as well as coursework, thus covering exam impersonation (although a separate specific offence to this effect could be included to put the matter beyond doubt).

Link to the rest at The International Journal for Educational Integrity

Self-Plagiarism: When Is Re-Purposing Text Ethically Justifiable?

From Qrius:

In an institutional environment where researchers may be coming under increasing pressure to publish, the temptations to take short cuts and engage in duplicate or redundant publication can be significant. Duplicate publication involves re-publishing substantially the same data, analysis, discussion and conclusion, without providing proper acknowledgement or justification for the practice. Such behaviour is often condemned as ideoplagiarism or self-plagiarism, locating this practice as a parallel activity to that which appropriates other people’s ideas and words and reproduces them without due acknowledgement.

There are good reasons for censuring self-plagiarism – it distorts the academic record where meta-analyses are not aware of the duplicate publication, and provides an unfair advantage when academics’ track records are being compared.

. . . .

Global rankings and national assessments of universities are largely based on research inputs and outputs. Mostly, the output indicators privilege publications in international higher-ranking journals; the vast majority of those only publish in English. However, there are several good reasons why research outputs should also appear outside English-language journals.

. . . .

[R]esearchers often have made a commitment to disseminate the results of their studies to participants or to policy-makers – where either of these communities are not English-speaking, republishing in a language other than English may be entirely appropriate.

So, revising a published paper and translating that into a language other than English might be a laudable way of preserving a research culture in a small language group, influencing policy-makers or returning a benefit to a low- or middle-income country (LMIC). This activity, of course, needs to be acknowledged and transparent and cannot be double-counted as a research output.

. . . .

In 2018, I co-authored an article on research ethics in Taiwan with a Taiwanese academic (Gan and Israel, in press). This will be published in Developing World Bioethics and we shall explore the possibility of modifying it for a Mandarin version aimed specifically at a readership of Taiwanese academics and policy-makers. While many senior Taiwanese academics are fluent in English, this is less likely to be the case among those who have not completed postgraduate qualifications in North America, Australasia or the United Kingdom. Publishing in Mandarin would extend access to our work (including allowing it to be found in a search using Traditional Chinese script), and may make it more readily available for undergraduate teaching. Sometimes, we can craft opportunities to help readers of other languages without translating the entire article. A recent article that I co-authored with Lisa Wynn (Wynn & Israel, 2018) took advantage of the American Anthropologist’s policy of publishing all abstracts in both English and Spanish. At our request, the editors agreed to add abstracts in Arabic and French.

I wonder if fear of being seen as self-plagiarising, also inhibits academics writing book chapters in research ecosystems where chapters do not count for much.

. . . .

[I]t is difficult to continually deliver a novel angle for such a chapter, when the brief from the commissioning editor is so similar. I have collaborated with co-authors in order to develop new directions. However, sometimes this is not practicable and yet there may still be some value in repurposing existing text and tailoring it for a new audience.

. . . .

[T]he publishers as a matter of policy quite understandably challenged any article that relied on previously published work for more than ten per cent of its material. However, the editor had approached me looking for a synthesis of work that included, updated and condensed material that had already appeared in my single-authored book. I had raised the matter of overlapping text with him, and so he was able to persuade the publisher that a far larger fraction was warranted in this case. My book publisher also agreed.

. . . .

[T]he 2018 Australian Code places responsibility on researchers to ‘Disseminate research findings responsibly, accurately and broadly…’ (Research Responsibility 23).

. . . .

None of these codes or guidelines explicitly considers repurposing existing text, nor do they focus their discussion of dissemination on academic publications. Nevertheless, they do require us to consider what dissemination strategy might be most appropriate and this may well involve adapting and translating material for academic publication in order to reach new audiences.

Link to the rest at Qrius

PG will note in passing that some of the best and some of the worst professors he encountered during his undergraduate studies were widely-published. Granted this is a limited sample, but he found no reliable correlation between quality instruction in the classroom and the number of publications a particular professor claimed.

PG supposes counting scholarly articles and rating the academic journals in which they appear is an easier task than determining whether students of a particular professor are receiving a quality educational experience, but it does reflect where students stand in the university’s calculation of the reputation of a professor.

This is a separate question from whether there is really anything new and interesting to say about Milton.

PG felt bad after writing the preceding sentence and feels the need to balance his Milton snark with a reminder of what lovely poetry the man wrote (as opposed to the many unlovely scholarly articles that have been written about Milton):

On His Blindness

When I consider how my light is spent
Ere half my days in this dark world and wide,
And that one Talent which is death to hide
Lodged with me useless, though my soul more bent
To serve therewith my Maker, and present
My true account, lest He returning chide,
“Doth God exact day-labour, light denied?”
I fondly ask. But Patience, to prevent
That murmur, soon replies, “God doth not need
Either man’s work or his own gifts. Who best
Bear his mild yoke, they serve him best. His state
Is kingly: thousands at his bidding speed,
And post o’er land and ocean without rest;
They also serve who only stand and wait.”

PG must also note that Milton slowly became blind, likely from glaucoma. In addition to the above-mentioned sonnet, Milton wrote Paradise Lost (blank verse, ten books with over ten thousand lines of verse, dictated to scribes) when he was completely blind.

The Emotional Arcs of Stories Are Dominated by Six Basic Shapes

From The Computational Story Laboratory (citations omitted):

The power of stories to transfer information and define our own existence has been shown time and again. We are fundamentally driven to find and tell stories, likened to Pan Narrans or Homo Narrativus. Stories are encoded in art, language, and even in the mathematics of physics: We use equations to represent both simple and complicated functions that describe our observations of the real world. In science, we formalize the ideas that best fit our experience with principles such as Occam’s Razor: The simplest story is the one we should trust. We also tend to prefer stories that fit into the molds which are familiar, and reject narratives that do not align with our experience.

We seek to better understand stories that are captured and shared in written form, a medium that since inception has radically changed how information flows. Without evolved cues from tone, facial expression, or body language, written stories are forced to capture the entire transfer of experience on a page. A often integral part of a written story is the emotional experience that is evoked in the reader. Here, we use a simple, robust sentiment analysis tool to extract the reader-perceived emotional content of written stories as they unfold on the page.

We objectively test the theories of folkloristics, specifically the commonality of core stories within societal boundaries. A major component of folkloristics is the study of society and culture through literary analysis. This is sometimes referred to as narratology, which at its core is “a series of events, real or fictional, presented to the reader or the listener”, who further define narrative and plot. In our present treatment, we consider the plot as the “backbone” of events that occur in a chronological sequence. We first find an analogous definition in Aristotle’s theory of the three act plot structure: A central conflict emerges in act one, followed by two major turning points in acts two and three before concluding with a final resolution. While the plot captures the mechanics of a narrative and the structure encodes their delivery, in the present work we examine the emotional arc that is invoked through the words used. The emotional arc of a story does not give us direct information about the plot or the intended meaning of the story, but rather exists as part of the whole narrative. This distinction between the emotional arc and the plot of a story is one point of misunderstanding in other work. Through the identification of motifs, narrative theories allow us to analyze, interpret, describe, and compare stories across cultures and regions of the world. We show that automated extraction of emotional arcs is not only possibly, but can test previous theories and provide new insights with the potential to quantify unobserved trends as the field transitions from data-scarce to data-rich.

. . . .

We consider a range of these theories in turn while noting that plot similarities do not necessitate a concordance of emotional arcs.

  • Three plots: In his 1959 book, Foster-Harris contends that there are three basic patterns of plot (extending from the one central pattern of conflict): the happy ending, the unhappy ending, and the tragedy. In these three versions, the outcome of the story hinges on the nature and fortune of a central character: virtuous, selfish, or struck by fate, respectively.
  • Seven plots: Often espoused as early as elementary school in the United States, we have the notion that plots revolve around the conflict of an individual with either (1) him or herself, (2) nature, (3) another individual, (4) the environment, (5) technology, (6) the supernatural, or (7) a higher power.
  • Seven plots: Representing over three decades of work, Christopher Booker’s The Seven Basic Plots: Why we tell stories describes in great detail seven narrative structures:
    • – Overcoming the monster (e.g., Beowulf ).
    • – Rags to riches (e.g., Cinderella).
    • – The quest (e.g., King Solomons Mines).
    • – Voyage and return (e.g., The Time Machine).
    • – Comedy (e.g., A Midsummer Night’s Dream).
    • – Tragedy (e.g., Anna Karenina).
    • – Rebirth (e.g., Beauty and the Beast).

In addition to these seven, Booker contends that the unhappy ending of all but the tragedy are also possible.

  • Twenty plots: In 20 Master Plots, Ronald Tobias proposes plots that include “quest”, “underdog”, “metamorphosis”, “ascension”, and “descension”.
  • Thirty-six plots: In a translation by Lucille Ray, Georges Polti attempts to reconstruct the 36 plots that he posits Gozzi originally enumerated. These are quite specific and include “rivalry of kinsmen”, “all sacrificed for passion”, both involuntary and voluntary “crimes of love” (with many more on this theme), “pursuit”, and “falling prey to cruelty of misfortune”.

The rejected master’s thesis of Kurt Vonnegut—which he personally considered his greatest contribution— defines the emotional arc of a story on the “Beginning– End” and “Ill Fortune–Great Fortune” axes. Vonnegut finds a remarkable similarity between Cinderella and the origin story of Christianity in the Old Testament, leading us to search for all such groupings. In a recorded lecture available on YouTube, Vonnegut asserted: “There is no reason why the simple shapes of stories can’t be fed into computers, they are beautiful shapes.”

. . . .

For a suitable corpus we draw on the freely available Project Gutenberg data set. We apply rough filters to the entire collection in an attempt to obtain a set of 1,737 books that represent English works of fiction.

. . . .

Using principal component analysis, we find broad support for six emotional arcs:

  • “Rags to riches” (rise).
  • “Tragedy”, or “Riches to rags” (fall).
  • “Man in a hole” (fall–rise)
  • “Icarus” (rise–fall).
  • “Cinderella” (rise–fall–rise).
  • “Oedipus” (fall–rise–fall).

See the rest in the following embedded document.

(Trigger Warning: There is math.)

[pdf-embedder url=””]

Secondhand Books: the Murky World of Literary Plagiarism

From The Guardian:

“As if there was much of anything in any human utterance, oral or written, except plagiarism!” opined Mark Twain more than 100 years ago. “The kernel, the soul – let us go further and say the substance, the bulk, the actual and valuable material of all human utterances – is plagiarism.”

Twain was writing to his friend, the deafblind author Helen Keller, after reading her autobiography, in which she recounted her own experiences of being accused of – and admitting to – plagiarism. When she was 12, Keller wrote a short story called The Frost King. It was published and, as Keller recounts, “this was the pinnacle of my happiness” – until she was “dashed to earth” when the similarities between her story and Margaret T Canby’s The Frost Fairies emerged. “The two stories were so much alike in thought and language that it was evident Miss Canby’s story had been read to me, and that mine was a plagiarism,” wrote Keller. She was subjected as a child to a formal investigation at the Perkins Institution for the Blind over whether or not she had plagiarised deliberately. It acquitted her; although she admitted she must have read Canby’s story, she could remember nothing of it.

“I have ever since been tortured by the fear that what I write is not my own.” she wrote. “It is certain that I cannot always distinguish my own thoughts from those I read, because what I read becomes the very substance and texture of my mind … My compositions are made up of crude notions of my own, inlaid with the brighter thoughts and riper opinions of the authors I have read.”

Or, as Nicola Solomon at the Society of Authors puts it, “fiction writers are magpies”. This month alone, we have seen the New York Times note the “striking” similarities between AJ Finn’s bestselling mystery The Woman in the Window and a self-published novel released two years before, a romance writer who blamed her ghostwriter for copying passages from other writers’ books, and Australian author Nick Milligan pointing out the common features in his novel Enormity and Danny Boyle’s forthcoming film Yesterday.

. . . .

Solomon says the Society is often contacted by authors complaining that another has pilfered an idea. “Some cases are absolutely straightforward, but legal action is very tricky in cases which don’t concern actual language copying but rely on copying of themes, plots or structure. They’re expensive to take, they depend on line-by-line checking of work and unfortunately many authors don’t have the funds to go against large publishing or film companies to take action with no guarantee of success.”

It has long been claimed that there are somewhere between three and 36 basic plots in all forms of storytelling. Three years ago, academics fed nearly 2,000 stories into a computer analysis and concluded that there were six “core trajectories” for all stories. None of these common plots, however, include a character called Jack who passes off the Beatles’ music as his own on another planet (Milligan’s Enormity and Boyle’s new film), or an alcoholic, agoraphobic woman who watches a crime play out in the house opposite her own (Finn’s bestseller and British author Sarah A Denzil’s Saving April).

. . . .

Finn’s thriller was published in January 2018; Denzil’s novel was self-published in March 2016. Reviews on Amazon and Goodreads have pointed out the similarities between the two books for months. “Almost the same thing” says one Goodreads reviewer on Saving April. “Seems so similar to Girl in the Window. Feels like I just read a different version of that,” runs an Amazon review of Denzil’s book.

The New York Times recently noted the parallels between the two being “numerous, and detailed”. And they are (spoilers ahead): both centre on a woman who has lost a child and partner through a car accident for which she blames herself, a fact that isn’t at first revealed. Both feature a new family moving in opposite, whom the agoraphobe watches. She becomes convinced the husband is abusing the wife and befriends their adopted teenager who, after gaining her trust, turns out to be the villain.

. . . .

The New Yorker profile identified another work similar to The Woman in the Window: the 1995 film Copycat, starring Sigourney Weaver and Holly Hunter. The film and the novel both see a psychologist become trapped in her home by agoraphobia, drink too much, be mistrusted by the police and join a forum that turns out to be dangerous. The director of Copycat, Jon Amiel, told the New Yorker that the debt was “not actionable, but certainly worth noting, and one would have hoped that the author might have noted it himself”.

Link to the rest at The Guardian

Some of the books mentioned in the OP: