content scrapers featured

How to Kick Content Scrapers in the Balls

People will do anything these days to post free content. Instead of writing something fresh, they would rather copy posts from your site and post them on their own. It’s much less evil than it is lazy.

Content scrapers think that by including a by-line at the top of the post and an attribution at the bottom – or not – that it’s “curation. But it’s still copyright infringement. I’ve found that even Content Marketing “experts” will scrape content, and then advocate the practice as “Curation,” like it’s a smart thing to do. 

What exactly is scraping

According to Google, examples of scraping include:

  • Sites that copy and republish content from other sites without adding any original content or value
  • Sites that copy content from other sites, modify it slightly (for example, by substituting synonyms or using automated techniques), and republish it
  • Sites that reproduce content feeds from other sites without providing some type of unique organization or benefit to the user
  • Sites dedicated to embedding content such as video, images, or other media from other sites without substantial added value to the user

How content gets scraped

Your content can be scraped through any number of ways, both overt and covert.

Scraped via RSS

If your site has an RSS Feed, and it should, you’re making it very easy to scrape your content. Scrapers use the feed to deliver your text, images and links right to their service.

One way to combat scrapers is to change your RSS Feed to from Full to Summary, so they only get the first paragraph. Trouble is, all your Feed recipients would get the Summary, and you may not want that.

Hacking scraping

Content scrapers can become almost like hackers in the way they use fetching, AJAX, CSS Hooks and Markup to steal your content. Don’t believe me? Read what this assho…er, scraper has to say

kick content scrapers

Scraping tools

In my recent post, The Difference Between Content Scrapers and EelsI mentioned that, Google might advise against web scraping, but it does offer a Chrome app called Web Scraper to make the scraping easier!

Do scrapers merely select your text and images, copy them and paste them into their site? I don’t think so. Content scrapers are too lazy for that. So don’t bother using a plugin to prevent text selection. That will only frustrate you when you need to copy something yourself.

Draw your line

Some people say content scraping is a good thing for deep link building and to just let it happen. Others say, scraping is like whack-a-mole, that you can take one down and another pops up.

What I say is you should be vigilant and pick your battles with content scrapers by deciding what line has to be crossed before you’ll take action.

For me, since most of my stuff was scraped from Business2Community.com, with which I had an agreement to curate my stuff, my line was drawn at the links.

  • If a scraper posted an entire article, links and all, and did not link back to Blogsitestudio.com, they were on my shit list.
  • If the scraper did not include any attribution, they were really on my shit list.
  • If a scraper attributed a full post to themselves, boy did I let them have it!

But that’s just me. You might have shorter lines in the sand.

Ways to kick content scrapers in the balls

There is no reason to let content scrapers get away with copying and profiting from your writing. Here are more ways to kick them in the balls.

Email a legalese letter

Start nice. Email the scraper a sober, non-threatening letter, written by a lawyer or like a lawyer. State the fact that they are using your content illegally, point out the law as it applies to the Digital Millennial Copyright Act (DMCA), and make a demand.

Be sure to include a deadline so they know you’re ready to take it to the next level.

No email – try social media

Some sites are so shady, they don’t even have a Contact page or link to email of any kind. But maybe the Admin has a name. Try Googling that. If you find the person on social media, make a connection and send your demand letter that way.

kick content scrapers

Takedown #2 – Thanks to demand letter

I found one of my scrapers on Linkedin and invited him to connect, which he did. I sent my demand letter to him, but he ignored it.

No email or social – try comments

If the scraper has no email on the site and no name to search for social media, just about the only thing you can do is place the demand letter – or an excerpt – in the post’s comments. If the site is on auto-pilot, the comment will appear. If it’s being moderated, the scraper is forced to see the demand letter.

content scrapers 1

Takedown #1 – Thanks to WordPress.com

The problem with commenting is that it might lead to an online fight, and that can look messy.

Request a Takedown

Having no satisfaction from contacting the content scrapers directly, your next option is to complain to the hosting company under the auspices of DMCA.

Hosted sites

What’s great about your scraper being hosted by a free platform, like WordPress.com, is that you can complain directly to them and they will be your enforcer. After all, hosts don’t want to condone copyright infringement any more than a normal person would, corporations being people and all.

content scrapers tumblr

I appealed to these complaint departments, and googling the provider and “DMCA’ will get you to the same pages. Their forms are easy to complete. Include your personal information, the infringing URL, the source URL, and a description.

In my experiences, the hosts acted quickly and decisively in their takedown of my scraped posts, sometimes notifying me with an email.

Here are the providers to whom I’ve complained so far:

Self hosted sites

If the scraper has a self-hosted site, it takes a bit more work to find the correct ISP with whom to complain.

You can check Whois.net to find the infringing site’s hosting provider. Sometimes it’s not so easy, especially if they pay to conceal the information.

Here’s what an above board ISP lookup looks like:

content scrapers bluehost

The Nameserver reveals the host to whom you will complain.

Using DCMA

You can certainly craft a letter using the language of the DMCA and the pertinent information.

Here is a sample DMCA Takedown Request you can use. You can send that to the abuse email for the ISP, whilst CCing the site owner.

I found that DMCA offers a DIY Takedown service called Website Protection Pro that is pretty cool. For $10 per month (or less with a coupon found on the Internet) you can use the service for an unlimited number of cases. They provide you with forms to complete, which you save as PDFs and send to ISPs.

content scrapers 3

Takedown #4 – Thanks to DCMA

So, for $7 bucks I used the DMCA DIY Takedown service and was surprised to find that not only was my scraped content removed, but the entire site was taken down!

The Managed Takedown service also offers a Lookup Tool to find the correct ISP for the infringing site, similar to Whois.net. And, DMCA offers badges to place on your site to let scrapers know you have “protection.”

content scrapers 4

Takedown #3 – Thanks to Tumblr

(The “Content Marketing” scraper whose site was removed promptly contacted me through Facebook to apologize. I checked out his other site where his most current post described the art of content “curation.” Bugger.)

Also, the Chilling Effects database collects and analyzes legal complaints and requests for removal of online materials, helping Internet users to know their rights and understand the law. The data enables them to study the prevalence of legal threats and lets Internet users see the source of content removals.

Screw the scrapers

If getting a scraped post or site taken down is not satisfying enough, you can hit scrapers harder where it hurts.

In How to Identify Content Thieves and Hit Them Where it Hurts, Jennifer Mattern outlines more fun ways to kick content scrapers in the balls.

Getting stolen content de-indexed from search engines

First, I contact the major search engines with DMCA requests to have the infringing material removed from their search results. This way, if the scraper happens to be getting any search traffic, that can be shut down.

Stripping their ad revenue

If the site owner is using an ad network to serve ads on the infringing content, report them to the network. They’re almost guaranteed to be in violation of the ad network’s terms. After all, the advertisers paying that ad network don’t want their ads running alongside illegally-published material. So the network is in a position to take action.

If they have private advertisers, you could also reach out to them. Chances are they’ll discontinue their ad contracts if they find out their company is associating with a site owner who openly breaks the law. Some won’t. And it won’t always be worth your time if there are many private advertisers.

Go above their head

This site was also hotlinking my images (instead of hosting a copy themselves, they were loading it directly from my server to their site, which steals your bandwidth)…I had a bit of fun redirecting image files to an “I steal content” image when they were loaded from his site.

Take advantage

Benjamin Ehinger, in How to Prevent Content Scraping on Your Work, describes An Approach Allowing You to Take Advantage of Content Scrapers.

  • Auto Link Keywords – With a plugin, such as SEO Smart Links, you can actually replace keywords with affiliate links. This will help you gain even more links pointing to your affiliate account when a scraper steals your content.
  • Internal Linking – When you add a large amount of internal links to your posts, scrapers will actually link back to your posts when they use your content. This is a good way to get backlinks and steal some of their visitors.
  • Use an RSS Footer – With the RSS Footer plugin or the feature in WordPress SEO by Yoast, you can create an RSS Footer. This can be customized however you want and you can promote your own products in your RSS. This will help you when a content scraper steals your content.

Do something

If your posts have been scraped, whatever you do, don’t do nothing to kick content scrapers in the balls.

Please do something to make lazy-ass site owners think twice about stealing your content and possibly damaging your reputation, sucking your link juice, outranking your original posts, and making the Internet a cesspool of repetitive content. You’ll be helping to make the Internet a better place.

So, what are you doing to stop content scrapers from scraping your content?

, , , , ,

6 Responses to How to Kick Content Scrapers in the Balls

  1. Teddy Rose July 3, 2015 at 12:05 am #

    Great article. Now it seems that many blogs have a reblog button to click on to re-post someone else’s content on your website. I have seen WordPress sites with this. I have been wondering why the encouragement to bloggers to re-post other’s content. I have also been wondering if it could actually hurt your Google ranking if you were to re-post others content.

    Do you know if there is a way to find out if someone has scrapped your content, like a Google search?

  2. Hartley July 3, 2015 at 10:17 am #

    Hey, I’m the “assho…er” scraper that you linked to. If you’d poked around my site a bit more before trying to skewer me in your article, you’d see that I also wrote an article called “Preventing Web Scraping: Best Practices for Keeping Your Content Safe” which lays out many technical methods for trying to prevent your content from being scraped: https://blog.hartleybrody.com/prevent-scrapers/

    Like using a hammer, scraping is just a tool that can be wielded for good or evil. Writing about it doesn’t make someone worthy of scorn.

  3. Mari Kane July 3, 2015 at 11:32 am #

    Thanks Teddy. I too wonder why there is both a push to scrape as well as a pushback against it. It can hurt both your ranking as well as the scrapers. It’s confounding.

    And yes there are ways to discover scraping, as I pointed out in my previous post, https://blogsitestudio.com/difference-between-content-scrapers-and-eels/.

    Best way is to enter your title with quotation marks into Google and search. Your post should be at the top, but if not, it’s been scraped.

    Have to say, working with DMCA does give one a sense of righteous power.

    Cheers to that!

  4. Bill Sanderson July 3, 2015 at 5:51 pm #

    An interesting article on how to respond if you are scraped. But how do you know that your work has been scraped in the first place?

  5. Mari Kane July 5, 2015 at 12:11 pm #

    There are a couple of easy ways, Bill.

    In Webmaster Tools, check out who’s linking to you.
    Or google the title of the post, in quotes.

    I have set some titles to Google Alerts, with quotations, in case new scrapes pop up.

    There’s more info here:
    https://blogsitestudio.com/difference-between-content-scrapers-and-eels/.

    Cheers!

  6. Mari Kane July 6, 2015 at 11:57 pm #

    Thanks for checking in Hartley!

    So what you’re saying is that you work both sides of the fence: warning people about preventing scraping while advising them on precisely how to do it. Admirable.

    Now tell me, exactly what good do you think comes from content scraping?

Leave a Reply