Duplicate content & SEO: Causes & solutions
Duplicate content is a hot topic in SEO and if you put yourself in Google’s shoes it’s easy to see why. To stay at the top of their game Google must provide the best possible search results, and nobody wants to see the same content appearing again and again in the SEPRS. Google deals with this ‘problem’ by deciding which is the most ‘authoritative’ version to display and ignoring the rest. It’s a neat solution and works most of the time.
There’s plenty of talk in SEO forums about ‘duplicate content penalties’, but in reality you’re unlikely to be penalised for publishing duplicate content. However, duplicate pages won’t appear in the search results and this can have a knock-on effect on the rest of the site.
With billions of pages to index Google’s spiders have really got their work cut out, and can’t afford to waste time. As a spider crawls a website it compares the content on each page with that already stored on Google’s database. If the content is the same (or very similar) the spider will look elsewhere for something new. And this may well mean leaving your site and not coming back.
So not only can duplicate content affect individual page ranking, but also the overall performance of the site. Surely the easiest way around this is to make sure that all your content is unique? Well yes…and no. Unique content is essential, but it’s equally as important that your content appears unique to Google. And you may be surprised at how easy Google gets confused.
The most common causes of duplicate content
- Multiple domains pointing to the same page
- Duplication of copy across different domains
- Database driven sites and dynamic pages
- Hosting ‘print friendly’ pages
- Syndicating articles for reuse elsewhere
- Copy theft and republication
Exactly how Google decides which ‘version’ of duplicated content is the most ‘authoritative’ is the $64,000 question. However, there are number of likely candidates, including: the date page was first indexed, Google’s trust in the site as a whole and the strength of the individual page.
Multiple domains pointing to the same page
Having a number of different domain names pointing to the same page can cause duplicate content problems. It’s what Search Engine Optimisers have loftily dubbed ‘canonicalization’ where the ‘canonical’ version is the original and most authoritative version.
Put simply Google identifies each URL as a different page, so if you point www.myurl.co.uk and www.myurl.com to the same server space; they appear to Google as two separate pages with identical content.
You’ll encounter the same problem if you have used a mixture of absolute links (www.myurl.com) and relative links (www.myurl.com/index.htm) to your homepage. Again Google sees two duplicate pages.
Once you’ve identified the problem it can be ‘fixed’ by adding the canonical tag or with the judicious use of 301 redirects.
Duplication of copy across different domains
It isn’t unusual for large companies to have a suite of website targeting particular niche markets. Say for example you already run a car insurance business and want to set up another website to target ‘young drivers’. Much of the information between two sites (such as ‘Terms & Conditions’, ‘Warranties’ and ‘Company History’) will be the same, so there’s a natural temptation to use the same content twice.
The trick is to not give in to temptation and to rewrite the content. However, if you don’t have the time or editorial resources, you can sidestep the problem by using robots.txt to prevent the page from being crawled.
Many of the above problems can be avoided by keeping tight control of your copy:
- New markets should be targeted with new copy. Don’t make the mistake of duplicating content across different URLs or different domain extensions.
- Be wary of ‘free copy’. If you are getting something for free, such as a list of product descriptions, the chances are plenty of other webmasters are getting it too.
- Don’t give your content away lightly. Handing over your copy to third parties, including affiliates, can cause your site serious problems.
Print friendly pages can be seen as duplicate
While users may welcome print friendly pages, spiders see them as duplicate content. The solution is quick and simple: put all your printer friendly pages in a single folder and prevent them from being spidered with robots.txt.
Dynamic pages are notorious for duplicate copy
With database driven sites the same page can often be accessed in a number of ways depending on how the parameters are ordered. For example:
Will display exactly the same page as:
However, because the parameters (‘product name’, ‘product type’ and ‘product colour’) are in a different order Google naturally reads this as two separate URLs…with the same content.
To overcome this problem your programmer needs to make sure that each variable is called-up in the same order; no matter where you are on the site. There are other ‘fixes’ available (using 301 redirects or with the help of robots meta tags), but they require a great deal of technical know-how and aren’t for the faint-hearted.
Similar problems can occur with dynamic sites that use session IDs and can be cleared up by using another method of tracking, such as Google Analytics (for more on Google analytics).
Article syndication can cause duplication
Allowing other sites to use your content as a means of generating revenue, publicity or links can easily backfire. If Google decides that your original copy looks more ‘authoritative’ on someone else’s website, your pages may drop out of the search results in favour of theirs.
Take the example of a fledgling travel writer who syndicates a feature to one of the online travel giants. As Google, who would you trust to rank: the tired-and-tested titan of travel or a rookie blogger?
The simple answer is to keep your copy to yourself. However, there are some steps you can take to tell the search engines that your site is original source of the content. First you need to make sure that your copy is online and has been indexed by Google before releasing it to anyone else. Next ask any sites who are reusing your content to provide a link from the published article back to your website. It’s a belt and braces approach which reinforces your site as the ‘authority’ in the search engines’ eyes. However, you can avoid any lingering doubt by rewriting all content before distribution.
You can also add Google’s author tag to each page which tells the search titan who’s the original author.
Article syndication sites should be given a wide berth. You may get one or two inbound links from webmasters reusing your (rewritten) copy, but the quality of links often leaves a lot to be desired.
Copy theft is a very unwelcome source of duplication
Plagiarism and copy theft are rife on the web. While it would be flattering to think the culprits are genuine admirers of your work; copyright infringement is usually down to scraper sites automatically duplicating your content. If you think someone has stolen your copy, take the following steps:
- To check if your content has been published elsewhere visit Copyscape. Alternatively run a Google search with line or two of your copy in “double quotes”.
- If you find a guilty website, email the webmaster asking them to take it down immediately. If you can’t find a contact address on the website in question visit the whois database.
- If you don’t get the desired response, the next step is to get in touch with the hosting company and inform them; this usually does the trick.
- Next get in touch with the advertisers. Most scraper sites carry advertising and as the biggest network this often means Google’s Adsense. Google subscribes to the Digital Millennium Copyright Act (DMCA) and works hard to stamp-out copy theft.
- Finally get in touch with the search engines. Report the site a spam, supporting your claim with as much evidence as possible, and keep your fingers’ crossed.
It’s also a good idea to use the copyright symbol across your site and make it clear in your ‘Terms and Conditions’ that you actively check for stolen copy and will take action if it’s found.