People will do anything these days to post free content. Instead of writing something fresh, they would rather copy posts from your site and post them on their own. It’s much less evil than it is lazy.
Content scrapers think that by including a by-line at the top of the post and an attribution at the bottom – or not – that it’s “curation. But it’s still copyright infringement. I’ve found that even Content Marketing “experts” will scrape content, and then advocate the practice as “Curation,” like it’s a smart thing to do.
What exactly is scraping
According to Google, examples of scraping include:
- Sites that copy and republish content from other sites without adding any original content or value
- Sites that copy content from other sites, modify it slightly (for example, by substituting synonyms or using automated techniques), and republish it
- Sites that reproduce content feeds from other sites without providing some type of unique organization or benefit to the user
- Sites dedicated to embedding content such as video, images, or other media from other sites without substantial added value to the user
How content gets scraped
Your content can be scraped through any number of ways, both overt and covert.
Scraped via RSS
If your site has an RSS Feed, and it should, you’re making it very easy to scrape your content. Scrapers use the feed to deliver your text, images and links right to their service.
One way to combat scrapers is to change your RSS Feed to from Full to Summary, so they only get the first paragraph. Trouble is, all your Feed recipients would get the Summary, and you may not want that.
Content scrapers can become almost like hackers in the way they use fetching, AJAX, CSS Hooks and Markup to steal your content. Don’t believe me? Read what this assho…er, scraper has to say.
In my recent post, The Difference Between Content Scrapers and Eels, I mentioned that, Google might advise against web scraping, but it does offer a Chrome app called Web Scraper to make the scraping easier!
Do scrapers merely select your text and images, copy them and paste them into their site? I don’t think so. Content scrapers are too lazy for that. So don’t bother using a plugin to prevent text selection. That will only frustrate you when you need to copy something yourself.
Draw your line
Some people say content scraping is a good thing for deep link building and to just let it happen. Others say, scraping is like whack-a-mole, that you can take one down and another pops up.
What I say is you should be vigilant and pick your battles with content scrapers by deciding what line has to be crossed before you’ll take action.
For me, since most of my stuff was scraped from Business2Community.com, with which I had an agreement to curate my stuff, my line was drawn at the links.
- If a scraper posted an entire article, links and all, and did not link back to Blogsitestudio.com, they were on my shit list.
- If the scraper did not include any attribution, they were really on my shit list.
- If a scraper attributed a full post to themselves, boy did I let them have it!
But that’s just me. You might have shorter lines in the sand.
Ways to kick content scrapers in the balls
There is no reason to let content scrapers get away with copying and profiting from your writing. Here are more ways to kick them in the balls.
Email a legalese letter
Start nice. Email the scraper a sober, non-threatening letter, written by a lawyer or like a lawyer. State the fact that they are using your content illegally, point out the law as it applies to the Digital Millennial Copyright Act (DMCA), and make a demand.
Be sure to include a deadline so they know you’re ready to take it to the next level.
No email – try social media
Some sites are so shady, they don’t even have a Contact page or link to email of any kind. But maybe the Admin has a name. Try Googling that. If you find the person on social media, make a connection and send your demand letter that way.
I found one of my scrapers on Linkedin and invited him to connect, which he did. I sent my demand letter to him, but he ignored it.
No email or social – try comments
If the scraper has no email on the site and no name to search for social media, just about the only thing you can do is place the demand letter – or an excerpt – in the post’s comments. If the site is on auto-pilot, the comment will appear. If it’s being moderated, the scraper is forced to see the demand letter.
The problem with commenting is that it might lead to an online fight, and that can look messy.
Request a Takedown
Having no satisfaction from contacting the content scrapers directly, your next option is to complain to the hosting company under the auspices of DMCA.
What’s great about your scraper being hosted by a free platform, like WordPress.com, is that you can complain directly to them and they will be your enforcer. After all, hosts don’t want to condone copyright infringement any more than a normal person would, corporations being people and all.
I appealed to these complaint departments, and googling the provider and “DMCA’ will get you to the same pages. Their forms are easy to complete. Include your personal information, the infringing URL, the source URL, and a description.
In my experiences, the hosts acted quickly and decisively in their takedown of my scraped posts, sometimes notifying me with an email.
Here are the providers to whom I’ve complained so far:
Self hosted sites
If the scraper has a self-hosted site, it takes a bit more work to find the correct ISP with whom to complain.
You can check Whois.net to find the infringing site’s hosting provider. Sometimes it’s not so easy, especially if they pay to conceal the information.
Here’s what an above board ISP lookup looks like:
The Nameserver reveals the host to whom you will complain.
You can certainly craft a letter using the language of the DMCA and the pertinent information.
Here is a sample DMCA Takedown Request you can use. You can send that to the abuse email for the ISP, whilst CCing the site owner.
I found that DMCA offers a DIY Takedown service called Website Protection Pro that is pretty cool. For $10 per month (or less with a coupon found on the Internet) you can use the service for an unlimited number of cases. They provide you with forms to complete, which you save as PDFs and send to ISPs.
So, for $7 bucks I used the DMCA DIY Takedown service and was surprised to find that not only was my scraped content removed, but the entire site was taken down!
The Managed Takedown service also offers a Lookup Tool to find the correct ISP for the infringing site, similar to Whois.net. And, DMCA offers badges to place on your site to let scrapers know you have “protection.”
(The “Content Marketing” scraper whose site was removed promptly contacted me through Facebook to apologize. I checked out his other site where his most current post described the art of content “curation.” Bugger.)
Also, the Chilling Effects database collects and analyzes legal complaints and requests for removal of online materials, helping Internet users to know their rights and understand the law. The data enables them to study the prevalence of legal threats and lets Internet users see the source of content removals.
Screw the scrapers
If getting a scraped post or site taken down is not satisfying enough, you can hit scrapers harder where it hurts.
In How to Identify Content Thieves and Hit Them Where it Hurts, Jennifer Mattern outlines more fun ways to kick content scrapers in the balls.
Getting stolen content de-indexed from search engines
First, I contact the major search engines with DMCA requests to have the infringing material removed from their search results. This way, if the scraper happens to be getting any search traffic, that can be shut down.
Stripping their ad revenue
If the site owner is using an ad network to serve ads on the infringing content, report them to the network. They’re almost guaranteed to be in violation of the ad network’s terms. After all, the advertisers paying that ad network don’t want their ads running alongside illegally-published material. So the network is in a position to take action.
If they have private advertisers, you could also reach out to them. Chances are they’ll discontinue their ad contracts if they find out their company is associating with a site owner who openly breaks the law. Some won’t. And it won’t always be worth your time if there are many private advertisers.
Go above their head
This site was also hotlinking my images (instead of hosting a copy themselves, they were loading it directly from my server to their site, which steals your bandwidth)…I had a bit of fun redirecting image files to an “I steal content” image when they were loaded from his site.
Benjamin Ehinger, in How to Prevent Content Scraping on Your Work, describes An Approach Allowing You to Take Advantage of Content Scrapers.
- Auto Link Keywords – With a plugin, such as SEO Smart Links, you can actually replace keywords with affiliate links. This will help you gain even more links pointing to your affiliate account when a scraper steals your content.
- Internal Linking – When you add a large amount of internal links to your posts, scrapers will actually link back to your posts when they use your content. This is a good way to get backlinks and steal some of their visitors.
- Use an RSS Footer – With the RSS Footer plugin or the feature in WordPress SEO by Yoast, you can create an RSS Footer. This can be customized however you want and you can promote your own products in your RSS. This will help you when a content scraper steals your content.
If your posts have been scraped, whatever you do, don’t do nothing to kick content scrapers in the balls.
Please do something to make lazy-ass site owners think twice about stealing your content and possibly damaging your reputation, sucking your link juice, outranking your original posts, and making the Internet a cesspool of repetitive content. You’ll be helping to make the Internet a better place.
So, what are you doing to stop content scrapers from scraping your content?