Categories: Copyright/Intellectual Property

A Brief History of Article Spinning

From Plagiarism Today:

Article spinning is one of the most talked about and least-understood technologies when it comes to online plagiarism.

For some it’s a mythical technology that makes it possible to create unlimited original content. For others, it’s a way to easily defeat plagiarism-detection technology. For others still, it’s an antiquated technology that that managed to hang around despite being completely worthless.

To a degree, all of these statements are true but none of them tell the full story. To really understand article spinning, we need to explore what it is and, most importantly, the history of the technology.

Only by understanding where it’s been can we understand where it’s going and why it’s still relevant over 14 years after it first became an internet phenomenon.

. . . .

Article spinning is a technique to generate seemingly original content from old content by replacing words or phrases with synonyms.

For example, if I were to write the sentence “The cat walked into the house” an article spinner might reinterpret that as “The feline strolled into the home” or “The kitty wandered into the shelter.”

This is done through “spintax“, which is a syntax that tells the article spinner what words to swap out. For example, in spintax the sentence above might be written as “The {cat|feline|kitty} {walked|strolled|wandered} into the {house|home|shelter}.”

The power of this is that, across a 500 to a 1,000 word article, a computer can automatically create thousands or even millions of permutations, each at least slightly different from the others. While most of these permutations might not fool human readers, they might fool computer algorithms, such as those used by search engines or plagiarism detection services.

In short, article spinning is usually about creating a large quantity of content in a bid to fool other algorithms. As you might imagine, this technology did not get its start in ethical uses and, instead, it’s origins are founded almost exclusively in the real of spam and unethical search engine optimization tactics.

. . . .

Created by Don Harrold, ArticleBot (in 2005) was a surprisingly basic tool. Users would copy and paste content into it and ArticleBot, using the method described above, would generate thousands of articles based upon it. It didn’t incorporate an RSS scraper (though it was rumored) nor did it interact with article directories. It was up to the user to write or obtain the content as the only scraper pulled from search results.

This didn’t mean ArticleBot was an ethical tool. Though Harrold said he created it as a way to combat search engines from stifling free speech, the tool was primarily used as a black hat SEO tool, a way to generate a large volume of seemingly original content.

This was especially important at the time because one of the key concerns with search engine optimization was duplicate content. Though Google repeatedly claimed it didn’t penalize duplicate content (a claim it still makes today) it was also widely known that pages with similar content would not rank highly side-by-side in search results. Spammers were shifting away from simply repeating the same content over and over and using ArticleBot (and similar tools) to save time.

. . . .

On February 23, 2011, Google made one of its most significant algorithm changes in all of its history. Impacting a full 12% of all search results, the Panda/Farmer update was a nuke dropped on article spinning.

The update wasn’t aimed at spinning sites. It was aimed at a phenomenon known as “content farms” where sites would pay human to write short, low-quality articles by the dozen. Demand Media, perhaps the best known of such content farms, was effectively destroyed by this update.

Still, when one looks as the components of a content farm (low quality content, lots of ads and very low engagement) it’s easy to see just why spam sites engaged in content spinning were also impacted. This is especially true when combined with an “attribution” update less than a month before that targeted content scrapers.

Panda would be updated six more times before 2011 was over (and many more times since). Google would also introduce a Penguin update on April 24, 2012 that directly targeted spam sites. Though it was much smaller, only impacting 3.1% of search results, it was an additional nail in the coffin of content spinning as an SEO tactic.

In January 2011, SEO sites were happily touting the benefits of article spinning. By December, they were explaining why it was a bad idea.

. . . .

The rise of plagiarism detection services such as Turnitin have caused many students to try and find ways to fool them. Some of them have turned to article spinning as a way to quickly “rewrite” a piece and escape detection.

Purveyors of article spinning technology have been all-too-happy to meed that demand. Often referring to the technology as “Automatic Paraphrasing” they offer up spinning tools to students for just this purpose.

Unfortunately, the results of such tools are usually very lacking. In one Reddit thread, for example, a teacher mocked a student who turned in a paper on George Orwell’s 1984 that spun the phrase “Big Brother is watching you” into “Enormous Sibling is viewing you.”

Link to the rest at Plagiarism Today

PG tried out a free online paraphrasing system. For the original paragraph, he inserted the first paragraph from the Declaration of Independence, published July 4, 1776, which reads:

When in the Course of human events it becomes necessary for one people to dissolve the political bands which have connected them with another and to assume among the powers of the earth, the separate and equal station to which the Laws of Nature and of Nature’s God entitle them, a decent respect to the opinions of mankind requires that they should declare the causes which impel them to the separation.

Here’s the paraphrase:

When in the Course of human occasions it gets to be fundamental for one individuals to break down the political groups which have associated them with another and to accept among the powers of the soil, the partitioned and rise to station to which the Laws of Nature and of Nature’s God entitle them, a better than average regard to the conclusions of mankind requires that they ought to announce the causes which prompt them to the partition.

Here are the first few sentences of Jack Kerouac’s On The Road:

I first met Dean not long after my wife and I split up. I had just gotten over a serious illness that I won’t bother to talk about, except that it had something to do with the miserably weary split-up and my feeling that everything was dead. With the coming of Dean Moriarty began the part of my life you could call my life on the road. Before that I’d often dreamed of going West to see the country, always vaguely planning and never taking off. Dean is the perfect guy for the road because he actually was born on the road, when his parents were passing through Salt Lake City in 1926, in a jalopy, on their way to Los Angeles.

And the paraphrase:

I principal met dignitary not long following my wife Furthermore i part up. I needed barely gotten over An not kidding disease that i won’t trouble with banter about, but that it required something to do for the miserably weary split-up and my feeling that All that might have been dead. With those happening to dignitary Moriarty started the and only my life you Might bring my life out and about. In the recent past that I’d frequently envisioned about setting off West to see those country, generally ambiguously arranging Also never taking off. Dignitary may be those immaculate guy for the street in view he really might have been destined on the road, The point when as much guardians were passim through salt lake city Previously, 1926, for An jalopy, for their lifestyle with los angeles.

PG :