60% of top SaaS blog articles are not unique - here is why it's Okay
This is our first SaaS edition of the state of plagiarism.
Since the launch of PlagiaShield, many SEO managers from SaaS companies have asked me about general statistics regarding how unique most content is in their industry.
What should they expect? A third of their content stolen? More than that?
We selected 50 blogs from the top SaaS companies and ran them in PlagiaShield.
Here is what we learned:
- The average top SaaS blog contains about 1000 articles.
- 600 of them are not unique anymore.
- For those pages with potentially stolen content, 54% of the text is found elsewhere on the web.
SEO professionals within SaaS companies can leverage their own scan to protect their content from plagiarism and uncover backlink opportunities.
(Direct download, no email required)
Top SaaS companies love SEO
As the 2020-2021 Gartner CMO Spend Survey suggests, Search Engine Optimization (SEO) represents about 10% of businesses’ marketing budget.
Top SaaS companies invest in their blogs to gain visibility and acquire customers. It is one the most significant sources of traffic for SaaS Websites, representing 26.4% of the average traffic contribution for the 50 biggest SaaS companies.
The performance of SaaS blogs is critical in highly competitive markets where the cost-per-clic of paid campaigns goes well beyond the dozens of dollars.
Take a look at the following keywords as examples (data from Mangools) in the US market:
- “project management software”: $42 per clic, 22,600 searches/month
- “marketing automation”: $30 per clic, 6,000 searches/month
- “crm for small business”: $45 per clic, 2,400 searches/month
A great SaaS blog requires original content
After many of its search algorithm updates, Google liked to remind all webmasters to:
- focus on content,
- and ask themselves the following question: “Does the content provide original information, reporting, research or analysis?”
No wonder writing and publishing unique content is essential to ranking among the top target keywords’ top results.
Doing it at scale with consistency requires enormous investment from in-house resources and marketing partners.
When scaling content, many SaaS companies have a clear editorial process. Before publication, each article goes through a plagiarism scan to ensure the employee/agency/freelance writer created original content.
However, they tend to underestimate how quickly their work is used once published. Here are two of the most common cases.
- Plagiarism to compete with your work: someone used most of your content without citing your work nor getting your authorization. A large portion of the text is pure copy/paste. Some sentences have been through minor modifications.
- Unintentional plagiarism: the author used a few sentences without proper citation, primarily due to a lack of knowledge regarding his/her obligations as a writer. We’ll cover how to deal with each of them in a few sections.
A clear lack of data regarding unique content
This first edition tackles plagiarism in the SaaS industry. More precisely, we measured how unique their articles are.
We’ll do our best to improve our methodology year after year, while being as consistent as possible. I look forward to hearing your feedback and recommendations.
What are the top 50 SaaS companies?
Getting the list of the largest SaaS companies by revenue was trickier than expected. We needed a reliable and rather exhaustive source from which we could get data year after year. We ended up using the public lists from GetLaka after some cleaning.
We considered these other sources, but they had significant caveats:
- Crunchbase: major SaaS players were missing, and many small companies were ranked within the top companies.
- Mike Sonders prepared a fantastic list, but the data is from mid-January. 2020 brought a lot of change, and we needed more recent data.
Do top SaaS brands host their blog on a dedicated subdomain?
Most SaaS brands host their blog on their main domain. Ten of them have it on a dedicated subdomain.
I have two anecdotes in this regard:
- It seems Zoho migrated their blog from a subdomain to the main domain. Both versions are coexisting for now. If you know more about this, I’ll be happy to hear your thoughts.
- Palantir was the only brand with a blog outside of their website. They publish all their articles on Medium without any custom domain.
What we learned
We ran each blog in PlagiaShield, with a maximum sample size of 5000 articles. Within the tool, we excluded a few sections when appropriate, such as ‘/category’, ‘/tags’ or ‘/author’.
In total, we compared about 2,6M pages and found 85k pages potentially plagiarising the top SaaS blogs.
Learning 1: About 1000 articles per blog
On average, the top SaaS blogs contained 1040 articles (median: 773). Note that we only considered public pages with more than 500 characters.
Learning 2: 2200 words per article
Articles we monitored contained about 2200 words on average. The median is 1050 words.
It’s interesting taking a look at the distribution of word length. We split it into two graphs so you can better appreciate the details.
Here is the distribution for articles of less than 1000 words. If we omit the very concise ones, we distinguish a spike around 500 words.
And here is the distribution for articles of more than 1000 words. We clearly see that many of them fall in the 1000 to 2000 words range.
Learning 3: 60% of blog posts are not unique
For a typical top SaaS blog of 1000 articles, 600 of them are not unique anymore.
This was a surprise to me as I expected a ratio closer to 50%.
Right below, you can see the distribution of SaaS blogs based on the percentage of articles that are not original. The curve tells us that many blogs are around the 45% and 85% mark.
Learning 4: More than half of the text is used elsewhere
When looking at non-unique pages, 54% of their text can be found elsewhere on the web. The median is 50%.
Right below is the distribution of articles based on the percentage of text that is unique to them. Only articles with non-unique content are represented on this graph. The first spike on the left of the chart gathers all articles with absolutely no unique sentence.
Learning 6: 1600 potential thieves per blog
The typical SaaS blog had about 1600 potential thieves.
We say “potential” because we do not know for sure if it is plagiarism.
Indeed, finding that a text is “not unique” can mean a number of things:
- The article has been fully plagiarized.
- A few sentences have been copied.
- The article has been distributed by authorized partners.
Here are a few cases where it is expected to find your content:
- PR distribution websites sharing your latest announcements.
- Podcast directories using your episodes’ description.
- Medium articles from your employees that you distributed on your site.
- Guest posts you wrote using large parts of existing blog posts.
There are other common situations:
- The staging site is fully/partially indexed. This requires an immediate fix.
- The content of the blog has been widely used in other subdomains.
Two main opportunities for SaaS businesses
SEO managers working for SaaS businesses have two main ways to act on the results after completing their scan:
- Take down the thieves.
- Claim backlinks and/or canonical links from the others.
Opportunity 1: Take down the thieves
PlagiaShield makes it easy to identify domains copying your articles. Data collected across all articles is aggregated and enriched at the domain level.
Some websites might have stolen a lot of your pages.
Depending on your case, you can either:
- Send a simple email asking the fraudulent domain owner to remove the infringing pages.
or
- Get legal help from a professional: send a formal letter asking for the removal of the pages as well as compensations for the damages. Asking for compensation might be an effective way to motivate the domain owner not to copy you again.
The second tool you should leverage is the DMCA form search engines provide. It allows you to ask for the removal of the infringing pages from their Search index. PlagiaShield prepares this form so you can be more productive and get a higher success rate.
Opportunity 2: Claim backlinks
This is a new strategy to gain dozens to hundreds of backlinks.
Suppose you or your website is of high authority within your community. In that case, many writers likely found ‘inspiration’ from your posts. However, they might not have given you full credits.
In some cases, this would fit the definition of “plagiarism”. Instead of sending a request to remove your content from their text, it might be a much better move to ask for a proper citation with a rightful backlink.
As an example, I will use PlagiaShield to uncover backlink opportunities generated by this very study. The tool will:
- find all public pages using this blog post
- show me sentences with exact and partial matches
- show me if the page links to PlagiaShield
What about your SaaS blog?
As I said, this is our very first edition of the “State of plagiarism”.
I’d love to hear your feedback. Pick your preferred channel to engage with the community and us.
What’s your experience with plagiarism?
Is there something we missed and that you would like us to cover?
Of course, give PlagiaShield a try! Scan your domain for free to uncover great opportunities.