Scraping Old Content From Dead Sites Can Still Be Copyright Infringement And Can Still Cause SEO Issues

In the past month or so I’ve unsubscribed from at least 5 so-called “SEO blogs” because they told people to break the law. Scrapers will traditionally use live content, content that is fresh and new. However, these SEO blogs told people to scrape from dead sites instead of live ones.

The process basically goes like this:

  1. Find dead site
  2. Go on the Internet Archive
  3. Steal content

The content on dead sites is not always uncopyrighted content. When a webmaster marks a site as dead, they still hold the copyright on it, thus you can’t copy it unless they explicitly state “This content is in the public domain” or unless they use some sort of license that allows copying.

In addition, scraping dead content can still cause you issues even if it is uncopyrighted. Think about other scrapers who’ve already stolen it: duplicate content. And chances are if the SEs find the other scraper’s version of it before yours then they are viewing that copy as the “original”, which will not harm them but could harm you.

Let’s not forget the fact is is completely unethical. Ethics on the Internet has been going downhill for some time (it’s become much more noticeable since the social networking boom). I tend to compare people without ethics to monkeys who throw their own poo. They are both stupid and lame. Set yourself some good rules and stick with them for the rest of your life.

Personally I think any form of content theft is wrong, even if it is as simple as using a press release (which can also be copyrighted, by the way). How often do you see the really popular bloggers do that? Do you think using someone else’s text and claiming it as your own makes you look like a better blogger? It doesn’t.

Write your own content!

And if you are really out of ideas, why not try guest blogging?

Please subscribe, or else I will cry. Do you really want to make a programmer cry?

8 Comments

  1. Jonathan Bailey Says:

    An excellent article and a point I had not considered. Bravo!

    I’ll have to write a follow up about this.

    But here’s the critical thing to remember: Life plus seventy.

    If the author has not been dead for seventy years, it is still copyright protected. It doesn’t matter if they destroy it, remove it from the Web, renounce it or are ashamed by it, they still hold copyright protection and, unless they place it into the public domain or CC license it, the default is do not copy.

    It is worth saying that people who have abandoned their sites are much less likely to search for their content and try to stop infringement. However, that isn’t to say that they didn’t just pack up and move either…

  2. Jeremy Steele Says:

    Would be creepy if some dead person’s site got ripped off and they somehow sued :shock:

  3. Jonathan Bailey Says:

    Well, you can will your copyrights and then have a loved one sue…

    It’s happened.

  4. Jeremy Steele Says:

    Out of curiosity are you the one using commentful from blogflux? If so, how well does it work?

  5. Jonathan Bailey Says:

    Yes, I am using it. It’s working quite well actually. I needed to keep track of my comments and, sadly, not all blogs offer the neat “subscribe via email” feature your site and mine provide.

    I’ve liked it so far, especially with the FF plugin, and it seems to work with more sites that Co.Comment. I’m early in the experiment, and it is indeed an experiment, but I am impressed.

    If you want I’ll keep you posted…

  6. Jeremy Steele Says:

    Yeah, sure, just drop a comment or e-mail or whatever with your experiments and such. Thanks ahead of time.

  7. Florchakh Says:

    I totally agree with you. I’m sick that some people just copy all content of my posts, and after adding a linkback they think everything is okay. Naturally that linkback is useless, Uncle Google gives duplicate content filter. :mrgreen:

    By the way, it’s much better to generate :twisted:

  8. Jeremy Steele Says:

    a few months back I used to let sploggers go who stole really small posts that did linkbacks (if they are followed), but I switched to a zero tolerance policy and wrote myself a nice script that lets me keep my sanity while filing dmca after dmca. Just insert their url, name, the original post, and boom done.

Leave a Reply

Note: By submitting your comment you agree to this blog's comment policy.

If you want a little icon next to your name - sign up for one at Gravatar.