Managing Duplicate Content Issues on the Magento Platform

-

As discussed in our last article on the subject, the Magento platform is a great ecommerce system for new merchants who are looking to get their feet wet with online retailing.  Not only is the platform intuitively easy to use, it also provides a great customer experience right out of the box (and did we mention that it’s free?!).

Of course, with any standardized system, it’s important to make sure any default behaviors don’t prevent your site from being indexed correctly by the search engines.  And while the Magento system out-of-the-box isn’t that bad when it comes to SEO, one particular weakness you’ll want to be aware of is the way the platform creates duplicate content pages through the establishment of multiple URLs for the same pages.

Now, despite popular belief, you aren’t going to be penalized in the search engines for having these instances of duplicate content on your site.  However, since they can screw up the way the search engines crawl and index your site – leading to non-ideal URLs being displayed in the SERPs – it’s important that you take the following steps to minimize these issues:

1 – Control Canonical Tags Effectively

As mentioned previously, one of the biggest weaknesses of the Magento platform – and really, most other CMS providers as well – from an SEO standpoint is the creation of multiple URLs that all point towards a single page.  Essentially, with the way Magento creates and rewrites URLs by default, it’s possible to wind up with all of the following URL variations for a single product:

  • Yoursite.com/product1.html
  • Yoursite.com/category1/product1.html
  • Yoursite.com/catalog/product1/view/id/1/
  • Yoursite.com/catalog/product1/view/id/1/category/1/

But while all of these URLs may serve a valid purpose for your Magento installation, they can be difficult for the search engines to process and index correctly.  If the search engines index all of these different variations, they’ll likely be flagged as duplicate content instances, leading to confusion as to which specific version should be displayed in the search results.

In our first article on SEO for Magento, we covered a few specific URL rewrite steps you’ll want to take to improve your store’s optimization – including setting standard SEF rewrites and specifying the www or non-www base for your URLs.

However, taking these steps alone won’t prevent the creation of these multiple URLs, which is where the canonical tag comes in handy.  This code snippet – which is recognized by Google, Yahoo and Bing – tells the search engines when a different URL should be indexed, instead of the page’s actual address.

As an example of how this technique should be implemented, suppose you’re editing the “Yoursite.com/catalog/product1/view/id/1/category/1/” page from the list above.  Ideally, it’d be best to have the cleaner “Yoursite.com/product1.html” version of this page displayed in the SERPs, as this could potentially increase click through rates to your site.  To let the search engines know how the longer URL should be treated, enter the following code into the <head> section of your longer URL page:

<link rel=”canonical” href=”http://www.yoursite.com/product1.html”>

Of course, while adding canonical tags to your site is a great way to minimize instances of duplicate content, there’s one pretty obvious weakness.  If your site has a handful of products, adding these code snippets to each page on your site isn’t a big deal, but what if you have hundreds or thousands of different items for sale?  Are you really expected to go through and add canonical tags to each potential instance of duplicate content on your site?

In fact, a much easier way to handle this is with an add-on module like Yoast’s “Canonical URLs” or MageWorx’s “SEO Suite Pro.”  Either one of these options (or any of the other similar extensions available today) will make it easier for you to automatically add the necessary canonical tags to your site to manage page URL versions and potential duplicate content filters.

2 – Prevent Indexing of All Non-Content Pages

Once you’ve got your canonical tags in place, the next step you’ll want to take to improve Magento’s default content handling and prevent further instances of duplicate content is to set up “noindex” tags for all non-content pages.  This includes items like your checkout pages, your customer account pages, and your internal search results pages – basically, anything you wouldn’t want to appear in Google’s natural search results listings.

Traditionally, this is controlled by adding the “noindex” code to your site’s meta robots tags, but the Magento platform can make it difficult to access and correctly configure these sections.  Although Groove Commerce has a good example of how a Magento robots.txt file can be modified to include the appropriate indexing controls, an easier solution is to use Yoast’s Meta Robots module to set “noindex” tags beyond the standard options offered in the “Web” section of the Magento backend.

3 – Add “Nofollow” Tags to Non-Content Page Links

In addition to preventing your Magento site’s non-content pages from being indexed by the search engines, you’ll want to add a “nofollow” tag to any internal links that point to these pages from within your site.

Unfortunately, there aren’t currently any external modules available to add these link tags automatically within Magento itself.  However, you can edit your template files to nofollow your links by hand in order to avoid having the manually edit each individual link every time you add to your website.

4 – Add a Sitemap

Next up, while it’s not a fool-proof solution to preventing instances of duplicate content from being detected on your site, adding a sitemap that informs the search engines’ spider programs on how to index your site can help promote proper crawling and indexation (especially when combined with the techniques described above).

While Magento does offer an XML sitemap building utility within its backend (found by navigating to Catalog -> Google Sitemap -> Add Sitemap), you’ll want to keep an eye on this built-in functionality to ensure that your sitemap is rebuilt whenever you add new products.

5 – Rewrite any stock product descriptions

The final issue that must be mentioned when it comes to managing duplicate content issues on ecommerce sites is the need to rewrite any stock product descriptions used on your site when reselling mass-manufactured items (though really, this is an issue affecting all online retailers – not just those on the Magento platform!).

Basically, if you’re a reseller, drop-shipper or other shop that sells widely-available products, you might find it tempting to simply copy the manufacturer’s stock product descriptions and paste them – word-for-word – onto your own site.  Please, please, please don’t do this!

At best, you won’t receive negative treatments from the search engines for engaging in this essentially-plagiarism practice, though you will lower your odds of providing an enjoyable experience for your store visitors.  At worst, the content from your site (even if it’s marked correctly with canonical tags) won’t be displayed in the SERPs if it’s found to constitute an instance of duplicate content from other already-indexed sites that copied the same descriptions you did.

Because recent Google algorithm updates (including both the Panda and Penguin changes) seem to revolve around providing high value content, take the time to craft your own unique product descriptions.  Make them as engaging and enticing as possible in order to give your site the best advantage from both a search engine and user experience standpoint in the long run!

Image: katerha

3 Responses

  1. Richard

    Nice post, some helpful tips, thanks.

  2. Rewriting Product Descriptions | thewritedata.com

    [...] here’s the catch. According to Single Grain SEO, you should rewrite stock product descriptions because it can be seen as a negative by Google [...]

  3. Tim Clark

    Yoast meta robots will not work on later versions past 1.4, So you can use your local.xml file as below.

    NOINDEX,FOLLOW