INTERNET
The Google Duplicate Content Penalty. Myth or Fact?

The Google Duplicate Content Penalty. Myth or Fact?

Perry Bernard

21 October 2015, 1:33PM

Perry Bernard

651 views

The Google Duplicate Content Penalty. Myth or Fact?

I’ve watched a few arguments about this one in the last two years, and there seems to be a fair bit of variation of opinion on it. While I don’t really want to state emphatically that it doesn’t exist, I’ll tell you why I don’t see it as an issue and often act as if it doesn’t.

One man’s word, and a bit of logic.

John Mueller, a Google Webmaster of significant standing and whose word is quite reliable when it comes to how Google works, is known to have said that when Google sees duplicate content, the issue they want to avoid is having two pages in Google search results that basically offer the same content. From a user perspective, duplicate content in search is not ideal because as a user you probably want a variety of choices in the search results, not the same thing several times over.

But let’s say there are two or more URLs somewhere on the internet that have exactly the same content. What happens to these if they are both equally well-matched to a search query? Here’s my take on it:

If the two pages (or maybe it’s actually really just one page via two or more directory paths) compete against each other in search results, then one will win and show up in search and the other will lose and may not render any search result at all. Does that mean the second page was penalised? Well, no. It just means that the first page had some edge that we are yet to discover, and since both pages contained the same info, there was no point in showing the second result.

A common way that content gets duplicated is when a website has two ways (URL pathways) that lead to the exact same page. Take an eCommerce store for example. I love using a clothing or footwear store as a great example for content duplication, because they are classic examples of product in many different sizes and colours. So when a clothing store displays a t-shirt product, it may come in 5 colours and 5 sizes. There’s a possible 25 different combinations. Depending on how the store operates and how it displays these variations, it may have a unique URL for each, probably creating 24 near or total duplications of one main page of content. Where it gets more interesting is when the store has these in more than one category too. It may be in the “summer wear” category, and in the “tops” category and in the “cotton” category all at once. Plus, if the website supports internal search, it’s probably also in the search page for “t-shirts” and in the tagged page for “red” etc. Wowie! Do we have a serious duplication issue here! Does that mean that on the balance of possible pages we end up with this site gets a duplicate content penalty? Well, no. Not really. It’s just that only one of the variations of page URL will render in search if someone searched for “red cotton t-shirts”. But which one? To help control which URL gets priority, you can use a tag called “canonical”. The page you prefer to present as the one that appears in Google search should be tagged with the canonical statement, which looks something like this:

<link rel=”canonical” href=”http://mydomain.com/my-preferred-page-for-red-cotton-t-shirts/” />

(your actual implementation will not match this!)

This statement needs to appear in the head of the page.

So what happens when other matching pages are assessed for rank? Google looks for the statement and then displays the page this statement refers to, in priority over other URLs with the same content. That is not a penalty for rank and neither does it prevent you from getting business, because the page you selected as the canonical page should be the main page for cotton t-shirts. From there, the user can select size and colour and make the purchase. Ask your web developer how you can implement canonical tags for further details on how that’s done.

Who created the duplicate?

I regularly post content on social media channels like Facebook and LinkedIn. A lot of the time, I’ll post the exact same post in 5 different places. Each of those posts work for me. I mean, they operate as a lead source equally and I don’t really care which one works better than another, so long as one works. In the case of social media posts, I can’t use the canonical tag because I’m not in control of the head of the HTML document. The social media platform controls that. So canonical is a strategy I can’t use to signal a preferred page. And I don’t even really care. That’s because no matter which version gets found, I’ve presented my value proposition, a link to my page, a way to contact me, a reason to contact me, or maybe some other conversion method and elements, and I’ve effectively spread my eggs between several baskets.

In cases where someone re-shares or re-tweets my content, it still works as a lead source for me, but now I have even more duplicates. I actually want that to happen, because each duplicate might rank for slightly different reasons and slightly different searches, plus Facebook, Twitter, Google+ and LinkedIn all have their own search engines. Now I am listed in a bunch of different search platforms with multiple chances of getting found, contacted and hired. Yay!

The only reason I (sometimes) don’t want my content duplicated:

When someone grabs content from my website and claims it as their own, they are stealing from me. I don’t appreciate that, and nor would I want them getting creative credit for the content. Worse still, I really don’t want Google presenting their version of the page over mine. Can that even happen? Yes. Can Google detect which page has the original content? Possibly. Their algorithm is smart, but I noticed it doesn’t care about which page was live first and depending on crawl limitations, it might not have detected that anyway. It only cares which page has an edge from the user perspective, and it might not be mine.

In cases where your content does get ripped off, you can file a complaint with the Digital Millennium Copyright Act (DMCA) to have the offending page removed from Google search. If the thief is local, you might also be able to prosecute under local copyright or intellectual property laws, but that might prove expensive. Just look at it this way: duplication is a form of flattery! Obviously someone deemed your content worthy of using it themselves. If it came to a battle for writing content, I can pump out 2000 word articles in just an hour or two. The content thief can probably only copy and not create. Not much intellect required there, so I like to see the ripped-off content as a compliment to me as the content creator and just not bother stressing too much over the theft issue.

BTW: The DMCA is a US-based authority, but since Google is a US company, these two work together to help mitigate content theft, even if you are not located in or laying a complaint from the USA.

Clearly from my content above you can see I don’t really worry too much in the “duplicate content penalty” if it exists. I just believe that if the duplicate page has slightly favourable factors to gain rank, it will render in favour over other duplicates. But a different duplicate from the same set can render if it is slightly favourable for something else. I don’t believe this should be viewed as a penalty. And, if the duplicates use canonicalisation to point to a ‘master version’, then the master version will render in search. Google supports cross-domain canolicalisation too, so people using your content can even mark their copy of it with a canonical link pointing to your website and I often do this when I place identical content on two or more of my websites. I practice exactly what I preach.

Did you like this article? Please feel free duplicate and share with a credit to me. http://perrybernard.com

For assistance with your website content, please contact Forge Online at +6493020447 or email to sales@forge.co.nz or visit: http://www.forgeonline.co.nz