Your news articles are not as unique as you might think

The recent Google algorithm updates keep reminding us that original content is key to SEO performance.

However, we know that news leaders tend to be copied quite quickly. Large-scale plagiarism is a problem most publishers are aware of.

But how big of a problem is it? How unique are news articles?

We studied 200k of them from 100 news leaders to find out.

TL;DR:

  • 62% of articles are not unique anymore.
  • The amount of text copied goes way beyond a few quotes.
  • Take a sample of 2,000 articles; you’ll find its content on 16,000 pages on the web.
  • There are three main opportunities for news leaders.

Original content is vital for SEO performance, including for news sites

SEO still king of the unpaid marketing channels

The 2022 Gartner CMO Spend Survey CMO spend Survey shows us that SEO remains the top unpaid channel investment in 2022. While it slightly decreased compared to 2021, it still represents 8.5% of the total marketing budget.

Large publishers are no exception: traffic to news sites heavily depends on the SEO performance of their articles. Users trust them for accurate and timely information.

Google growing concerns about original/unique content

Google has long been rewarding original content. The more unique the article is, the better it performs.

Back in August 2019, a core update focused on content quality. See for yourself what the first question was for self-assessment:

Does the content provide original information, reporting, research or analysis?

This July, the Google Search team updated the general guidelines for webmasters. Here is what they ask reviewers to look for when evaluating the main content (MC) of a page:

A factor that often distinguishes very high quality MC is the creation of unique and original content for the specific website. While what constitutes original content may be very different depending on the type of website, here are some examples:

● For news: very high quality MC is original reporting that provides information that would not otherwise have been known had the article not revealed it. Original, in-depth, and investigative reporting requires a high degree of skill, time, and effort. Often very high quality news content will include a description of primary sources and other original reporting referenced during the content creation process. Very high quality news content must be accurate and should meet professional journalistic standards.

More recently, Google introduced the “helpful content update”, designed to promote content that is helpful and informative. This type of content is typically original, as it’s based on the author’s unique perspective and expertise.

The constant improvement of Google’s algorithms should make it harder for plagiarists to pass off someone else’s work as their own, shouldn’t it?

The fight against plagiarism is far from over

One might think that the fight against plagiarism is over. After all, Google has been working on it for years.

Well, we’re not there yet.

Before all else, news sites must ensure that their articles do not contain plagiarized passages. We see a new plagiarism scandal among top publishers every now and then. As the latest Economist/YouGov poll reminds us, many news sites suffer from a lack of trust: plagiarism never helps with that. Keeping the bar high regarding to the editorial process is critical here.

Monitoring internal work is not enough. Most of the work involves checking that competitors do not steal from you and benefit from it. Experience shows us that infringing sites are still thriving. They steal content at scale and get the visibility they do not deserve.

The Google algorithms can only do so much. They can’t detect plagiarism on their own. Google expects publishers to leverage its DMCA form to defend their content against plagiarism actively.

Here is what we learned

Experiences are great, but studies are even better.

Let’s see what we learned from our study of 200,000 articles from 100 news leaders across 10 countries.

We focused on the top ten most visible news sites in the following countries: the United States, Germany, the United Kingdom, India, France, Italy, Canada, Australia, Spain, and Mexico.

You can find the most recent ranking of news leaders on the NewzDash site. Shout out to John and his team.

To see how often articles were plagiarized, we sampled 2,000 recent articles from each publisher and ran a plagiarism check.

So, how original are news articles?

Learning 1: 62% of articles not unique anymore

For a sample of 2,000 articles, 1,240 of them are no longer unique.

I’m not surprised at all. We observed the same order of magnitude when studying blogs of top SaaS businesses.

This proportion varies a lot from site to site and country to country.

US and Canada suffer the most on that metric, with 24% and 28% of unique articles, respectively. They are followed by:

  • The UK, Australia, and Italy in the 33%-34% range.
  • India, Mexico and Germany in the 39%-42% range.

Spanish and French sites are less copied, with 51% and 53% unique articles.

If we look at individual sites, we spot scary data points: one US news leader has as low as 5% unique articles.

Learning 2: it’s not just a few quotes

If we focus on articles that are not unique anymore, how copied are they?

Our study shows it goes way beyond a few quotes. That means that what we observe is not unintentional plagiarism but a deliberate attempt to copy content.

Here are three stats to get a better grasp of the situation:

  • 18% of these articles have absolutely no original text anymore.
  • Half have more than 85% of their text found elsewhere on the web.
  • On average, at most 30% of the text is unique.

Learning 3: it’s not just a few sites

We know most of the articles and their content is plagiarized. Maybe it’s just a few sites, right? A couple of news aggregators and a few domains you do no care about…

Not really.

For our average sample, similar content can be found on 16,000 web pages across more than 2,500 domains.

Once again, it varies a lot by country, with US sites being copied the most.

The three main opportunities for news leaders

Opportunity 1: Step up the editorial process

Even though the problem will mostly come from outside your organization, you should prevent it on your side.

If not implemented yet, make plagiarism checks mandatory in the editorial process.

Opportunity 2: Take down thieves

Make it a habit to take down infringing content.

The DMCA form is one the best ways to report plagiarism. The one from Google is accessible via the Google Search Console. It allows you to request the removal of content that infringes on your copyright.

My advice: do not try doing it manually. Use PlagiaShield to quickly identify domains copying you. Then, use our Chrome extension to fill in DMCAs in record time.

My second piece of advice: do it weekly. The sooner you remove infringing content, the sooner you can recover your visibility.

Opportunity 3: Claim backlinks

Mistakes happen.

You’ll find many pieces of content that just omitted to credit you.

Spot them and claim your well-deserved backlinks.

What’s next?

Even though we only covered a small percentage of news sites, I feel like the study gives a good glimpse of the state of plagiarism in the news industry.

I encourage you to conduct a torough audit of your sites using PlagiaShield. It will probably uncover a lot of infringing content, but we’ll provide you with the tools to fight it.