Following are excerpts from a website called Similar Works. As the website states, the purpose of this site is to help protect authors from plagiarism of their books.
Beginning of Excerpts:
Protect your book from plagiarism
Similar Works is a web application built to protect authors from having their work exploited and their stories taken from them.
Upload your book to Similar Works and we’ll scan the text against other titles and keep monitoring — and alert you when we find any matches.
. . . .
How It Works
The Similar Works system analyzes ebook files and identifies matching text.
Books are submitted by authors and publishers who wish to protect their copyright, or concerned readers who believe they have found a work containing plagiarized material.
Similar Works reviews all books manually before accepting them into the archive.
Once a book is accepted, the system checks that book against all other titles already added.
If we find text matches that look suspicious, we contact the author or publisher and provide information so that they can take further action to protect their copyright.
We continue to check every time a new book is added to the archive.
. . . .
The Similarity Band
The Similarity Band is a visual representation of the similarities that our system finds in books.
Each Band is a generated watermark which represents the book’s text, from beginning to end, going from left to right. This is the text as it appears inside the digital file which is uploaded into the Similar Works archive, so it includes things like the copyright notice, the table of contents, disclaimers, back matter, and samples of other books.
When every book is run through the master algorithm, the text is split into logical chunks, usually consisting of no more than a sentence or two. The Band is generated by lining up all the chunks in order, and then recording a color depending on whether a similarity has been detected within that chunk or not.
. . . .
The Band can tell you a lot about how books are related to each other! For example, if a book has a lot of stripes on the far right side of the Band, then there are similarities detected near the end of the text. That probably indicates that the same back matter or samples appear in another book. Stripes on the left indicate similarities detected near the start of the text, and they are probably disclaimers or generic copyright notices.
(We do our best to filter out disclaimers and other generic language used by a lot of authors, so hopefully you won’t see too many of those.)
Unfortunately, the algorithm can only identify similarities. It can’t tell us why the similarity exists.
Common Phrases or Quotations
If you see only one or two stripes, then those are likely common phrases. The sensitivity of the master algorithm is carefully tuned to try to avoid this, but it’s not always successful. These can also be quotations.
Here’s an example of a Similarity Band for The Best of Relations, by Catherine Bilson. You can see that there is a single white stripe indicating a similarity about two-thirds of the way through the book. That similarity was identified as coming from none other than Pride and Prejudice, by Jane Austen – which is not unusual, as The Best of Relations is based on Pride and Prejudice! In this case, it’s a famous line from Jane Austen’s classic that Catherine Bilson added to her novel.
I was given good principles, but left to follow them in pride and conceit.The Best of Relations/Pride and Prejudice
What Does Plagiarism Look Like?
This is the Similarity Band for Royal Love, a romance novel by Cristiane Serruya. Royal Love is currently part of an ongoing court case filed by famed romance author Nora Roberts against Cristiane Serruya in April 2019, accusing her of plagiarizing lines from as many as forty other romance authors.
The Similar Works system has identified many similarities in Royal Love, spread throughout the book.
In addition to potentially being a great help to authors, PG thinks this is a fascinating field of analysis.
Here’s a link to Similar Works