Using blog content theft to your advantage

by james on February 23, 2009

At some point or another as a blogger you will almost certainly have to deal with scraper sites. For those that don’t know, these are sites that take your content and republish it as their own, often (in fact mostly) without properly attributing the source.

As a scraper, the easiest way of getting content is from an RSS feed, and there are plenty of plugins for popular content management systems that allow you to pick up an RSS feed and convert it into a post which you then pass off as your own. The plan then is to rank higher for key search terms than those sites you are taking the content from. This type of content theft is copyright infringement, and in an ideal world where everyone plays fair it shouldn’t be happening, but it does, so use it to your advantage and you can even make money from it.

So honest publisher, how to deal with scrapers, the scourge of the internet? Firstly, it helps to stop thinking of scrapers as the scourge of the internet. Think of them as another great way to get free keyword rich links back to your site, without you having to do any work. They are picking up your RSS feed and republishing it on another domain. This should start a few neurons buzzing about in your brain – hang on, if they’re trying to rank highly, they’re trying to accrue Pagerank, and I can force some of that to be passed to me if they’ve been sloppy in doing their scraping… Which trust me, many of them will have been!

Traditional methods

Before we get to how to implement that, let’s go over some of the more traditional ways of trying to stop scrapers. Firstly, you can cut your RSS feed down to snippets. If you’re running your feeds through Feedburner you can use the “summary burner” option, so that you only show a certain number of characters before people have to click through to your site to read the whole story. This doesn’t entirely solve the problem of scrapers, and will probably have little impact on increasing the CTR to your site either. You can also go for the ‘legal route’ of sending the offender a cease and desist letter, which will cost you in legal fees, take time and probably have little to no impact whatsoever. Lastly for the more traditional routes, you can check your logs, find the offender’s IP address and add in a line to your .htaccess file denying the scraper as follows:

<limit GET POST>
order allow,deny
allow from all
deny from [enter IP address here]
deny from [enter another IP address here]
</limit>

This is probably the best way of stopping scrapers but quite frankly, is a pain to implement.

Using scrapers to benefit your site

So, back to using the content thieves to your advantage. As mentioned earlier, many scraper sites don’t do a very good job of taking your content and doing anything with it other than simply republishing it. Use this to your advantage. Write articles that contain links with keyword rich anchor text to other areas of your site. For example, I could link in this article to my page on Bristol SEO and if this page were to be copied by your average scraper site I would have a not-nofollowed, free link to that page with the exact anchor text I wanted to rank for in place. So on top of the benefit that you will see from internally linking anyway, you will also see more high quality external links to your site.

Where this makes even more sense if when you have the Simpletags WordPress plugin installed – as mentioned in my second post on WordPress SEO. The plugin provides you with an option to include links to related articles within your posts, and also within your RSS feeds, plus you can set a number of how many related posts you want to link to. Once you have a reasonable number of posts on your site, and you’re beginning to see the plugin working, you’ll start to see that you’re producing a good few extra links to other content on your site. It’s likely that genuine human readers of your feed will click through on these links thereby increasing pageviews, but more importantly for the site that’s being scraped, you’re being provided with five extra links back to content on your site that contains the title of the post as the anchor text – which if you’re sensible will be good, keyword rich, anchor text. Again, these scraper sites are trying to rank for your content, but are shooting themselves in the foot because you are including links back to your site in your copy and in your related links. They are therefore haemorrhaging Pagerank back to you, so you will ultimately rank higher than them.

Make money from scraper sites!

As an extra incentive to the savvy publisher, many scraper sites also don’t strip out any ads you run in your feeds, so if you are running CPC ads from Google in your Feedburner account, and your content is being scraped, the scraper site will not remove the ads. Therefore any clicks on the ads will be credited back to your account, and you’ll be making money on them.

Naturally, it is perfectly possible that you will come across a scraper site that is set up ‘correctly’ and they are removing all your links in your content, getting rid of your related links and ads and are replacing keywords in your articles with links to things that they want to rank for, or using some form of CPC ads model themselves, but scraper sites doing this seem to be in the minority. If you do come across a site doing this with your content, they are obviously fairly determined, so a cease and desist letter probably won’t work either – your best bet in this situation is to check your logs and find an IP address that visits your site at very regular intervals (and is not in the range of any of the search engines’ spiders IP addresses) and ban that through your .htaccess file as outlined above.

Even this is not a guarantee of stopping a determined scraper as they can, and will, change IP addresses and continue to take your content. Unfortunately you’ll just have to live with this, and stop writing such good content!

Tags: blogging tips, content theft, copyright infringement, feedburner, RSS

Share and Enjoy:
  • Digg
  • Sphinn
  • Yahoo! Buzz
  • StumbleUpon
  • Facebook
  • del.icio.us
  • TwitThis

Related posts

Leave a Comment

Additional comments powered by BackType

Previous post:

Next post: