Thiefinder - Cool PHP Script That Logs Possible Bandwidth Theft

Updated: 10/4/07 (View Change Log)

Thiefinder is a light-weight script that will log possible bandwidth theft. It is not a tool for preventing it, instead it is a tool you can use to find out who is stealing your bandwidth without looking through huge access logs.

Depending on your site’s configuration, installation can either be a breeze or a pain in the butt due to the fact you have to modify .htaccess files.

1) Get It - You can download the source code for Thiefinder here. Once downloaded change the extension to .php

2) Upload And Configure It - Generally speaking there is no real reason to change any of the settings unless you want a different log file name or if you want the log to contain more than 100 entries (the default). If you don’t know PHP don’t mess around with it too much.

Once configured, upload the script and make sure the log folder is writable. By default Thiefinder will store logs in the script’s directory. Contact your host for help setting up your permissions if you don’t know what to use.

3) Set Up .htaccess - I would recommend using code similar to this for your .htaccess files:

You can either use my .htaccess generator to get the code needed, or you can manually set it up:

RewriteEngine on

RewriteBase /

#Rewrite info for Thiefinder
#is the referer this site?
RewriteCond %{HTTP_REFERER} !^http(s)?://(.+\.)?example\.com [NC]
#is it blank?
RewriteCond %{HTTP_REFERER} !^$
#if another site is referring to this page, do rewrite magic
RewriteRule \.(jpg|jpeg|png|gif)$ /path/to/thiefinder.php [NC,L]

Replace example.com with your domain (and copy + paste and change the line to add other domains that are allowed to view your images), then change /path/to/thiefinder.php to the path to thiefinder. I highly recommend placing this .htaccess file in the directory were you store your pictures, instead of the root directory.

To keep Google, Live, and Yahoo image searches from being thrown in your log, put this code in your .htaccess file as well (above the “RewriteRule \.(jpg|jpeg|png|gif)$…” line):

RewriteCond %{HTTP_REFERER} !^https?://(.*\.)?google\.com [NC]
RewriteCond %{HTTP_REFERER} !^https?://(.*\.)?live\.com [NC]
RewriteCond %{HTTP_REFERER} !^https?://(.*\.)?yahoo\.com [NC]

4) Enjoy - As crazy as it might sound, it didn’t even take 3 seconds for my install of Thiefinder to start logging bandwidth theft on my server.

Oh yeah, if you find any bugs just let me know in the comments, and if you use this script why not show a lil’ love and link to this post?

5) WordPress Users, Read This! - If you use WordPress you may want to take a look at my article on how to keep WordPress from writing over the .htaccess file. Because it auto-creates the file when updating your blog’s preferences, Thiefinder may not load depending on how you have everything set up.

Want To Help?

If you want to help out with development, feel free to make changes to the source code and e-mail the modified source (with a list of changes) to me. I’ll review it and if the changes are useful, I will include them in the next release.

Everyone who helps out with development will be recognized in the credits and will get a link on this page. Oh, and don’t forget a warm fuzzy feeling.

If you are going to branch out and make a new project based off of Thiefinder, please leave my name in the credits and give credit where credit is due. Oh, and why not drop me an e-mail so I can list your project here?

Special Thanks

Special thanks to:

Lincoln and
Jonathan

for letting me know about bugs and helping to make Thiefinder even better!

Please subscribe, or else I will cry. Do you really want to make a programmer cry?

24 Comments

  1. Lincoln Says:

    Hiya, just came from Blog Catalog, and man does this sound like exactly what I’m looking for (well almost). Check my latest post here:

    Hotlinking, SEO and Backlinks

    I mentioned a plugin that basically disabled rightclicking an image and instead gives the would be hotlinker an alternative snippet of code he can use instead. The code creates a backlink back to the site, a great way to actually derive some benefit from people who hotlink your images. Unfortunately it didn’t work too well on my site, but it might on yours. :smile:

    I’d disable hotlinking altogether but it screws up my feeds, and even then I still want my images to be indexed for the added traffic bonus it brings. That left me with the daunting task of sifting through my huge logs to check for hotlinking. If your script works it’s going to save me a LOT of trouble. Kudos for thinking up this. :mrgreen:

  2. Jeremy Steele Says:

    Thanks! If you need any help setting it up just let me know, and like I said in the post, I will be making a much easier version some time.

    The really handy thing is unlike traffic logs from awstats and such (which require you to login to your cpanel), you can add this to your bookmark toolbar and check up every once in a while (although you can password protect the log dir if you wish).

    Eventually I may even add in an option to create a blacklist you can set up to completely disable access.

  3. Lincoln Says:

    Thanks, I’ll give it a whirl when I have time to kill. Don’t want to end up blowing something up because I was too distracted. :mrgreen:

  4. Lincoln Says:

    Just an update, I made a go of it and it worked! In fact just like you it already started logging hotlinks seconds later, initially coming from MySpace. Figures. Damn those people. :evil:

    The only thing is I noticed it also seemed to be logging images linked from, uhhhh, my own site, like so:

    69.blah.blah.blah http://www.mysite.com /wp-content/uploads/image.png

    Is this normal behavior, or is there some reason this popped up? I tried to reproduce the results by accessing the articles containing the image in question, and then accessing them directly, with no effect. For some reason this particular IP address triggered the script to log it. Hmmmm…

    If you can find a way to filter links from Google Image Searches and so on, I think you’ll end up having a very popular (and maybe lucrative) script on your hand. For the life of me I cannot understand why with all the hotlinking hoopla out there nobody thought to provide a log analyzer that could make it easy even for a novice to check for hotlinking. Sheesh. :roll:

  5. Jeremy Steele Says:

    What does your htaccess file look like?

    The line for the example file I have above that looks like this:

    RewriteCond %{HTTP_REFERER} !^http(s)?://(www\.)?example.com/ [NC]

    Should have the example.com bit changed to your domain. (leave the http(s)?://(www\.)?) thing attached though)

  6. Lincoln Says:

    Yep it’s written just as you indicated, with “example” replaced by my site address. Wait, isn’t it missing a slash there right after example? :shock:

    Wouldn’t it actually be better to set the rewrite this way:

    RewriteCond %{HTTP_REFERER} ^http://(.+\.)?example\.com/ [NC,OR]

    where the (.+\.) replaces the (www\.)? Not that I even know what it means, but I’m assuming that would cover all subdomains.

    Oh, and I’m also getting this warning:

    PHP Warning: fopen(/home/yoholmes/public_html//wp-content/uploads/thumb-IDF%20biggun.jpg) [function.fopen]: failed to open stream: No such file or directory in /home/yoholmes/public_html/logs/thiefinder.php on line 78

    I’m assuming this might be due to the double slash ?? in that string, but I’m not sure. Might be something you want to look out for.

    BTW, completely unrelated question: are you using a plugin to automatically convert url addresses into links here in the comments? I noticed it wasn’t do that for my blog but I wasn’t sure if that was because WordPress was natively set up that way or if a plugin was interfering. :oops:

  7. Jeremy Steele Says:

    I think wordpress is set up to do the link thing by default.

    The php warning is probably due do the double slash, and I can put in something to test if the file exists first and junk, and return a 404 if it doesn’t exist. Bit of a whoops.

    That mod rewrite should include all subdomains, but i’ll do some testing to make sure that is true.

    Nice catch with the \ after example, I am not 100% sure if the slashes are required though. Some examples I see have them, and some don’t.

  8. Lincoln Says:

    That’s why I hate htaccess language, it all gives me a bloody headache. In any event, I added the slashes and used the (.+\.) string, so far it hasn’t broken anything. I’ve already started a blacklist in my root htaccess based on the new theft log. Really quite fun to see all the 403 hits I’m now getting. :mrgreen:

    Now to see about using a mischievous replacement image that can either advertise my blog or tell hotlinkers where to stick it. :twisted:

  9. Jeremy Steele Says:

    Just updated it, and it now checks to make sure the file exists before using fopen to open it (and print the image to the browser). Also added in other php nifty-ness to avoid the double slash problem. I am not 100% sure why it would cause a problem for you though, even with double slash it worked fine for me… oh well. Different server configs I guess.

    Stupid dreamweaver - I tried uploading the .txt file for the code and it kept cutting off 99% of the file lol.

    I’ll get working on whitelist/blacklist functionality asap.

  10. Lincoln Says:

    Niiiice. :mrgreen: There’s some null characters though right at the beginning of the script that you need to get rid of though (shows up after your blog url).

    Works great now. I’m getting all teary eyed here, *sniff*. So beautiful… :oops:

    I stumbled this page, hope you don’t mind. It’s the least I could do. :mrgreen:

  11. Jeremy Steele Says:

    Thanks for the stumble.

    I wonder how the null characters got there? Hmm…

  12. Lincoln Says:

    Oops, looks like there’s a couple of new issues:

    For some reason it’s blatantly disregarding the blacklist I set up in my root htaccess file. It was working fine before, but this new update cancels out my blacklist and allows the images to be hotlinked. ???? It was working fine before in conjunction with the original version of the script, so I don’t know what happened.

    Also, I’m still getting a PHP warning:

    PHP Warning: fopen(/home/yoholmes/public_html/wp-content/uploads/thumb-IDF%20gun.jpg) [function.fopen]: failed to open stream: No such file or directory in /home/yoholmes/public_html/logs/thiefinder.php on line 84

    This isn’t actually a big deal, since it still works fine (even though it’s ignoring my blacklist now)

    Is it possible to send me the original script (1.0)? That seemed to work better. :cool:

  13. Jeremy Steele Says:

    If it is ignoring your blacklist something is messed up with your htaccess, which is parsed by apache before the script is even executed. if you want just e-mail me it and I’ll take a look ( admin@nusuni.com ).

    I just figured out why you are getting the warning, I’m forgetting to urldecode the filename before the script loads in the image.

    Just updated the script with the urldecode addition on line 29

  14. Jonathan Says:

    Thanks for the code.

    In to-do you mentioned not listing hits from search engines. I just modified the htaccess argument like so to do that:

    
    RewriteEngine on
    
    RewriteBase /
    
    #Rewrite info for Thiefinder
    #is the referer this site?
    RewriteCond %{HTTP_REFERER} !^http(s)?://(.+\\.)?example\\.com [NC]
    #is it blank?
    RewriteCond %{HTTP_REFERER} !^$
    #is it from a search engine?
    RewriteCond %{HTTP_REFERER} !^http(s)?://(.*\\.)?google\\.com [NC]
    RewriteCond %{HTTP_REFERER} !^http(s)?://(.*\\.)?yahoo\\.com [NC]
    RewriteCond %{HTTP_REFERER} !^http(s)?://(.*\\.)?live\\.com [NC]
    #if another site is referring to this page, do rewrite magic
    RewriteRule \\.(jpg|jpeg|png|gif)$ /path/to/thiefinder.php [NC,L]
    

  15. Jeremy Steele Says:

    Just replied to your email, Jonathan.

    Feeling a bit sick (only got an hour of sleep last night :shock: ) so I’m not even going to try and upload a modified version till tomorrow. Would probably upload the wrong file or something lol.

    Thanks for your help

  16. Thiefinder 1.03 Released Says:

    […] wanted to let you know I just uploaded Thiefinder 1.03. Some spelling mistakes were corrected and it now has support for ico and bmp files. Instructions […]

  17. Apparently Network Solutions Doesn’t Like Following The DMCA Procedure Says:

    […] illegally hosted on a Network Solutions server. The splog was also stealing an image, so of course Thiefinder picked that up. I noticed a Network Solutions IP visited shortly after filing the notice, but all […]

  18. DMCA Guide For Bloggers: How To Give Sploggers A Run For Their Money | Blogging Bits Says:

    […] being hotlinked from splogs. I highly recommend using a little open source script I wrote called Thiefinder. I made Thiefinder after seeing an odd increase in my bandwidth consumption. 3 seconds after […]

  19. Thiefinder Updated Says:

    […] just updated Thiefinder and it will now stop executing if the requested image doesn’t exist. Before it would send the […]

  20. Htaccess Generator 1.0b1 Released! Says:

    […] Just a few notes: if you are a blogger or offer any type of RSS feed I highly recommend leaving the default “deny access” option. If you only allow access from certain domains your RSS readers may not be able to view your images in their readers. But by using deny access you can block hotlinkers (you can find out who is hotlinking to you by using a script like Thiefinder) […]

  21. Hotlinking, SEO and BackLinks, Oh My! | The Habitation of Justice Says:

    […] In the meantime, it looks like Thiefinder has a cool little PHP script that can save considerable time in checking your logs for hotlinking. If it works it would be a […]

  22. Virtual Hosting Blog » Take it Back! 100 Tips to Defeat Content Thieves Says:

    […] Thiefinder: With this script, you can find possible bandwidth thieves. […]

  23. Ted Says:

    How do I tell if it’s working? I just get a white screen…

  24. Jeremy Steele Says:

    The log will fill up after the images are loaded from sites you don’t white list. If it’s blank now it won’t be after such a connection.

Leave a Reply

Note: By submitting your comment you agree to this blog's comment policy.

If you want a little icon next to your name - sign up for one at Gravatar.