What Bloggers Need to Know About Duplicate Content
Ever read a “new” post someone has just shared and realize it sounds familiar? Often, once you get to the bottom, it’s noted that the post is “Edited from the Archives” or “A Re-Post.” Bloggers everywhere are re-posting their old posts. Often the re-posted version gets more attention and interaction, presumably since they now have more followers and have likely added a pinnable graphic, etc.
At a glance, it looks like a fabulous idea. And the “big bloggers” are doing it, so everyone else assumes that it’s fine. But when you take a closer look you run into a problem called “duplicate content“. The average hobby or family update blogger may not be concerned about Google rankings and SEO. But if you are serious about being intentional in your blogging, the last thing you want to do is make your blog hard to find because Google penalized you for duplicate content.
But re-posts from the archives aren’t the only form of duplicate content. Here are four things bloggers need to know about duplicate content—and four ways to avoid it.
(Please note: I am not an SEO expert. This post is based on my own research. I have attempted to over-simplify the information I have found into language that the average blogger can understand. Check out the links at the end of this post if you want to read the more technical posts on the subject.)
4 Things Bloggers Need to Know About Duplicate Content
1. Duplicate content can make Google think you are plagiarizing.
There’s more than one reason to “create before you consume”. If you write a post that sounds just like a post someone else wrote, with your own words sprinkled here and there for personalization, Google is going to notice—just like school teachers notice that a paper sounds just like the encyclopedia article on the topic.
2. Duplicate content can make Google think you are a content thief.
If you publish a post that is an exact duplicate of another, it appears to search engines that one of them is stolen (especially if they don’t link to each other). It might be that you are putting a copy of a guest post you wrote on your own blog, but Google doesn’t know that. It could just as easily be that you copied a post you liked from another blog and put it on your own blog. It’s called “content scraping” and it happens all the time. (You can try out a plugin like Plagiarism to automatically search for scraped copies of your blog’s content. If you’ve discovered an instance of your own content being stolen, click here to find out what to do.)
3. Duplicate content confuses search engines about where to send readers.
Search engines are constantly trying to improve their search results to include only the best and most relevant content for your search terms. That means that something has to go—and that will be the duplicate posts, those deemed irrelevant. And if Google looks at your site as a whole and sees a lot of content duplicated on your site and elsewhere? That will affect your site’s rankings.
4. Duplicate content confuses readers.
Even if you’re not worried about your search engine ranking, re-posting an old post is still splitting your blog traffic, likes, and shares between two posts. Not to mention that the related discussion (which is always fabulous–I love reading comments!) is spread out all throughout the archives (which might drive some of us detail people downright crazy). Readers don’t know which post to pin or like–let alone comment on. Duplicate content equals multiplied confusion.
4 Ways Bloggers Can Avoid Duplicate Content Penalties
1. Refresh an old post, don’t re-post from the archives.
It’s tempting—especially when writer’s block hits—to grab an old post, dress it up with a new picture and better grammar, and hit publish again. But that practice is littering your archives with duplicate posts. And Google is going to spot that. Best practice is to edit the old post, and re-share it again via social media and perhaps even your email newsletter. (Watch for the next post filled with ideas and methods for refreshing your archives.)
2. Guest post, don’t submit a previously published post.
Most guest post guidelines specify that you must submit a new, original post—one that has never been published elsewhere, including your own blog. That’s because a guest post that’s already been posted on your own blog isn’t a guest post: it’s duplicate content. Write a guest post that will give potential readers a taste of what they might find on your blog, without giving them your blog posts themselves.
(Looking for tips on writing a guest post? Rachelle Rea writes about how to write a guest post readers will want to read.)
3. Use a different title and intro when linking to your guest post.
When you guest post for someone else, it’s common courtesy (and often an understood agreement) that you’ll write a post on your own blog linking to your guest post. It’s a way to bring your community to theirs, and say thank you for the opportunity. However, you have the potential to hurt your SEO and the site hosting the guest post if you use the same title and beginning paragraphs of your guest post as an introduction on your own blog. It takes a bit more time, but it’s worth the effort to avoid duplicate content issues by creating a unique title and crafting a custom introduction to your guest post. Close your introduction with the words of the link itself in mind:
Visit So and So’s Blog to read my post: Very Brilliant Post title
4. Use excerpts only in your archives.
WordPress is a beautiful content management system in that it creates many ways to categorize and organize your content. However, those could also spell potential duplicate content issues. “Canonical” URLs help resolve this issue, but as an extra precaution, it’s a good idea not to show full posts on your archive pages. Choose excerpts only for your date, category, author, and tag archives. And consider using an SEO plugin to noindex at least some of your archives (on a single author blog you can disable the author archives altogether).
FAQ
But how much duplicate text actually equals duplicate content?
Only the companies behind the search engines know the answer to that question. The truth is, my post about duplicate content is going to share some of the same keywords and links as other posts about duplicate content. But as long as I’m writing my own slant on the topic, in my own voice—not plagiarizing, or copying large portions of text—I’ll have done my best.
What about quotations?
The general rule for quotations is that you’re allowed to quote 50-300 words without permission from the author, depending on proportional length of the quotation to the original source. So if you’re quoting a few hundred words or less, duplicate content shouldn’t be an issue. But if you can sum up what they say without quoting them directly, all the better (as long as you give credit where credit is due!).
What about content aggregation?
There are many “aggregator” sites dedicated to gathering links to blog posts all in one place. From what I’ve read, aggregation does not equal duplication as long as only post excerpts are used and the excerpts link back to the original posts. If a site approaches you about aggregating your content, your larger concern should probably be the reputation of the site and whether—in light of possible SEO implications—it will actually bring you more readers.
What about syndication?
Syndication across the web—whether it’s a press release or a column—is going to result in some forms of duplicate content. The site with the highest ranking (a large newspaper, perhaps) will usually be the first to come up in the search results. Be careful where you allow your content to be syndicated. Ask the potential syndicating site what their practice is for dealing with duplicate content and SEO implications. If they haven’t thought about it, then they should be.
What if a site like the Huffington Post wants to republish my post?
If a site like the Huffington Post approaches you about one of your posts, you probably don’t want to say no. At the same time, the Huffington Post probably won’t want to noindex, nofollow your post on their site, nor add a canonical meta tag that lets search engines know your site was the original source. Best practices therefore may be to add noindex, nofollow your own site’s version of the republished post and/or use a canonical meta tag to point to the large site republishing the post as the one search engines should index.
What if I’ve already duplicated my content on my site or somewhere else?
Watch for the third post in this series where we’ll talk about simple ways to make SEO amends for duplicate content. Fixing the issue can be as complicated as merging and redirecting posts, or as simple as adding a canonical link to the meta of each duplicated post pointing to the original source.
More in this series:
- How to Refresh Old Posts (without creating duplicate content)
- What to Do If You’ve Already Re-Posted Duplicate Content On Your Blog
More on the subject of SEO and duplicate content:
- SEO Tips for Beginners: What Is Duplicate Content?
- When is Reposting Content On Your Blog a Bad Idea?
- Duplicate Content: How Unique Is Unique Enough?
- Moz: What is Duplicate Content?
- Moz: What is Canonicalization?
- Moz: Beginner’s Guide to SEO
- The Moz Blog: Dealing with Duplicate Content
- The Moz Blog: Cross-Domain Canonical
- The Moz Blog: Duplicate Content in a Post-Panda World
Great post Gretchen, I’m glad that you are sharing your knowledge about the impact of unique and duplicate content on bloggers sites. If you want please try our new tool http://duplicate.ninja. There are some free samples of duplication that you can check. Let us know if that tool may be useful for you.
Thanks for the easy to follow tips! Followed them all, so hopefully my growing site won鈥檛 get hit with the duplicate content stick. Very clear, very newbie friendly. Well done!
Hi! I am wondering how it works for google if I delete the old post and then make a new post with the same, yet updated, content? I have some great stuff from when I first started blogging that I would love to reshare (not very many views happening on those at all right now) but I don’t want to affect the linking to my site.
Thanks!
If you do that, Angela, I’d simply make sure you do a 301 redirect from your old post to your new post. Or, use the same URL if you can so Google will view it as the same. 馃檪
Great information! If I would happen to have a new branded website, and want to delete an old post from my old brand website and repost (even though the “old site” would then have a deleted blog post), would that impact duplicate content? I’m not sure if the Google Cache would cause problems for my new brand or not, considering I own both.
Thanks!
As I understand it, if you deleted it from your old site and posted it on your new site, you’d be fine. Especially if you could redirect the post link on your old site to your new site.
Awesome – thanks so much Gretchen!
Great info! Thanks so much for putting this together AND making it easy to understand. 馃檪
You’re welcome, Candy! Thank you.
Great synopsis of the perils of duplicate content. One point about how Google handles duplicate content that you missed: Google assumes that the first publication of any content is the original and all others are duplicates and completely ignores them (regardless of nofollow/dofollow status).
With this in mind, from a search perspective, there is NO benefit to re-publishing content that has been published elsewhere, ever (barring a few quotes).
In its pursuit of creating the best user experience possible, Google will continue to refine its tools for filtering out duplicate content. If you are currently publishing duplicate content and not being penalized, don’t assume that you are home free forever. Google makes updates all the time and you could see all of your traffic disappear overnight (the Panda update put a lot of businesses out of business). Best practice is to simply not publish duplicate content — or allow your content to be duplicated — ever.
Thank you so much for this information, Lesa! So how does Google determine which content was published first? I assume it would have more to do with when Google first indexed something, since the publish date on a post or page could be edited?
Ditto to that best practice! I wish more understood it that way.
Thank you for this explanation. This was very helpful.
Okay question for you. As I get ready to launch my new author site, I know you had recommended doing the redirect to avoid broken links from my old sites. What if there are older posts that I just don’t like or want to rewrite completely? Would it be best to unpublish those, rewrite them however I prefer and then publish them on my new author site? Or will Google still see that content out there even it’s been unpublished almost three years later?
Sorry, that wasn’t written very clearly. Here’s my example. I have a blog post from two+ years ago on my current website called the ABC’s of Me. I’d like to use that content, rewritten slightly, as part of my About Me pages on my new Author site when it launches. The current site will be going away when the new one launches. How best should I handle posts that fall into that type of circumstance? Should I just plan to redirect all of my old posts to the new site or should I unpublish the ones I know I’ll rewrite and republish at the new place?
You know what? I don’t think it really matters too much. 馃檪 The old site will no longer be live, and the redirects will not be line by line but rather one basic domain to domain redirect. So, as long as you remove the post before you publish it as a page, it will be fine. Making the post private now, before you move, might ensure it’s removed from Google before it’s published on the new site. But I don’t think it will make too big a difference either way. Does that help? 馃檪
Thanks so much for writing this! I clicked over from my email to tell you that. 馃檪 I haven’t read it yet, but I’m pinning for later. thanks again!
Thank you, Jillian! I hope it’s helpful when you read it. 馃檪