Correlation, causation, and confusion: The Backlinko study of 912 million blog posts

Brian Dean of Backlinko, an SEO maven whose site includes a technique that “almost guarantees that you get high quality links from every piece of content that you publish,” published a study of 912 million blog posts. He says his results include worthwhile insights. I say they support an alternate explanation: nearly all content marketing sucks big time.

As Dean describes in his post (there’s the backlink he was hoping for!), Backlinko analyzed 912 million posts using data from BuzzSumo. I’ll share his findings and talk about what they mean — and don’t mean — for bloggers.

(By the way, this is a perfect example of what Jay Acunzo writes about best practices — if you do everything this study suggests but lack inspiration, you’ll create lame marketing that fails to move any needle that matters.)

1 Long-Form Content Generates More Backlinks Than Short Blog Posts

Long posts (3,000 words or more) get an average of more than 4 links from other sites. Short posts (less than 1,000 words) get an average of 2.3. Middle-length posts get somewhere in the middle.

Image: Backlinko. (Should it say the source right in the image? I thought that was a best practice. Now this damn graphic will spread all across the Web, shorn of its sample size and source, and become some sort of sourceless conventional wisdom. Sigh)

So should you make longer posts? Dean says this:

Key Takeaway: Content longer than 3000 words gets an average of 77.2% more referring domain links than content shorter than 1000 words.

Sorry, kids, that may be true, but it’s probably bullshit. Here are some possible interpretations that explain why:

  • A small proportion of long posts are fascinating, detailed analyses of topics that generate a lot of interest. Those posts drag the average up. I bet the median number of links to a long post is zero — that is, more than half the posts get no links at all. That would be completely consistent with the data shown here.
  • If most of the short posts are crappy valueless rambling, that would drive the average for the short posts down.
  • It may be that among the small number of intelligent people blogging, long posts are more popular. That is, perhaps longer posts are correlated with more thoughtful people, which are in turn correlated with a higher number of links. After all, it’s easy to generate 500 words of drivel, but much harder to maintain the stamina to create 4,000 vacuous words.

Correlation is not causation. But it’s worse than that. We don’t know what the correlation between post length and backlinks is. It could very well be quite close to zero. Outliers could still create the results shown here.

As Dean himself says: “While it’s impossible to draw any firm conclusions from our study, our data suggests that backlinks are at least part of the reason that long-form content tends to rank in Google’s search results.”

2 The Ideal Content Length For Maximizing Social Shares Is 1,000-2,000 Words

Average shares for less than 1,000 words: 150. For 1,000 to 2,000 words: 260. Dean: ” Articles between 1k-2k words get an average of 56.1% more social shares than content that’s less than 1000 words.”

This is subject to the same correlation causation argument as item 1. I’m sure most posts don’t get shared at all.

People read and share more short posts because they’re impatient. Does that mean you should write a shorter post? I’m all in favor of fewer words, but Dean’s post has 2,862 words and it’s been shared 1142 times. Maybe it had to be long. So maybe your post should be long, too. It depends on the content.

3 The Vast Majority of Content Gets Zero Links

You have to embrace this incredible graphic demonstrating this fact:

94% of content published gets zero external links
Image: Backlinko

This single fact alone proves my dismantling of point 1. Dean even says this: “only 2.2% of content generates links from multiple websites.” Why so few? Dean: ” While it’s impossible to answer this question from our data alone, it’s likely due to a sharp increase in the amount of content that’s published every day.”

Here’s a revelation: most content is crap. Most companies and people don’t have the ability to invest in it and promote it. Crap without sufficient promotion generates no interest. Ergo, no links. It’s not rocket science.

It’s Sturgeon’s Law, people.

4 A Small Number of “Power Posts” Get a Large Proportion of Shares

Dean says that 1.3% of articles get 75% of the shares. (Theodore Sturgeon would say that 90% of posts are crap; obviously, when it comes to blog posts, he’d be an optimist.) And “0.1% of articles in our sample got 50% of the total amount of social shares.”

My takeaway: awesome stuff goes viral. Some of it. Other awesome stuff doesn’t. Nobody can predict it. (On my blog, 34% of all the traffic has come from one post.)

5 There’s Virtually No Correlation Between Social Shares and Backlinks

Dean: “We found no correlation between social shares and backlinks (Pearson correlation coefficient of 0.078).”

If Dean could calculate a correlation coefficient, why not use it in points 1 and 2? Perhaps it undermines the desired conclusion?

So, do you want sharable content or content that generates links?

If you want to generate links, write clickbait. That’s awful content marketing, but it will get those sharing stats up!

Or, you could spend the time to create content that’s worthwhile, valuable, and useful, and then it would spread through both links and shares, and rank on Google. Of course, that takes work and inspiration.

6 Long Headlines are Correlated With High Levels of Social Sharing

Despite the word “correlated” in the headline, this is a stat just like the first one. 14+ word headlines get an average of 210 or so shares, while headlines with fewer than five words get about 140.

But based on the earlier stated fact that 1.35% of the articles get 75% of the shares, this correlation is almost certainly close to zero. Once again, a few very popular articles are very likely skewing the averages.

So should you put a long headline on your article? If it’s crap, it won’t matter. Long headlines won’t get you traffic.

But there is a useful takeaway here, and it’s the first one I will pay heed to. A long headline won’t necessarily make your content less popular.

7 Titles That End With a “?” Get an Above Average Amount of Social Shares

Here’s the chart:

Image: Backlinko

This is subject to the same fallacy as above. Only real takeway: a question-mark won’t hurt your post.

Based on this, my next post will be titled “? and the Mysterians are the best of the one-hit wonder groups.”

8 There’s No “Best Day” to Publish New Content

Posts on Sunday got 152.5 average shares. Posts on Friday got 149.1. If you do the math, that’s a 2% lift for posting on Sunday. Why aren’t we hearing more about that lift?

It’s irrelevant. But it’s no less relevant than the rest of these average-based analyses.

On my own blog, posts on Saturday and Sunday do worse than posts on Monday and Friday, which do worse than posts on Tuesday through Thursday. So I’ve learned to save the best content for Tuesday through Thursday, when I have a choice (that is, when it’s not based on a news hook).

My mileage varies. So will yours. Basing your posts on these averages won’t help you make useful decisions.

9 List Posts and “Why Posts” Get a High Level Of Shares Compared to Other Content Formats

List posts and
Image: Backlinko

This is useful, subject, of course, to the same caveats as the rest of the data.

But if you have a killer infographic or video, it will spread. And if you have a lame-o list, it will not. Perhaps it’s not mostly about the format?

Is my popular “10 top writing tips” post a how-to post or a list post? Hmm. I wonder how they did this analysis of which posts are which.

10 “Why Posts”, “What Posts” and Infographics Are Ideal Content Formats for Acquiring Backlinks

Unless they suck.

11 B2B and B2C Content Have a Similar Share and Link Distribution

According to Dean, B2C content gets shared 9.7x more than B2B content.

But in the same section, 1.3% of B2C articles and 2% of B2B articles generate 75% of shares. So again, we’re looking at outliers skewing the mean.

My advice: if your audience is business buyers, write B2B content. If it is consumers, write B2C content. Since your content marketing should reach your marketing targets, you really don’t have a choice. And Dean’s conclusions are equally spurious for both targets.

Bottom line

Thanks for listening. Create quality content. Share this post. It has 1,400 words and a long title with a number in it, so it should go viral with a little help from you.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.


  1. Nice analysis, Josh.

    If he made the raw data available, then we could test all hypotheses. I suspect, though, that you are right. Sadly, many people won’t question the findings and just proceed. In the process, they’ll waste a great deal of time.

    Cue obligatory Twain “lies, damned lies, and statistics” quote.

  2. Hey Josh,

    Very interesting post. While I disagree with a lot of your analysis on the statistics, I agree with the premise that it doesn’t make sense to blindly put the results into practice.

    Or as you put it:
    “It has 1,400 words and a long title with a number in it, so it should go viral with a little help from you.”

    Like any study, there are flaws and limitations with ours. I tried to point these out throughout the writeup.

    Also, I really like your writing style. Reminds me a lot of one of my favorite copywriters, Dan Kennedy.

    I just ordered your book from Amazon. I look forward to reading it!