Block or ban ip addresses/bots?

 
Post new topic   Reply to topic    Aprelium Forum Index -> General Questions
View previous topic :: View next topic  
Author Message
jrdeahl
-


Joined: 27 Dec 2006
Posts: 50

PostPosted: Mon Nov 12, 2007 6:06 am    Post subject: Block or ban ip addresses/bots? Reply with quote

I have got some ip addresses and bots hitting my server and sometimes multiple connections. How do I stop this crap? Sometimes a googlebot will have 4 or 5 connections from the same ip address.

Chewing up my bandwidth.

Thanks,

john
Back to top View user's profile Send private message
AbyssUnderground
-


Joined: 31 Dec 2004
Posts: 3855

PostPosted: Mon Nov 12, 2007 9:19 am    Post subject: Re: Block or ban ip addresses/bots? Reply with quote

jrdeahl wrote:
I have got some ip addresses and bots hitting my server and sometimes multiple connections. How do I stop this crap? Sometimes a googlebot will have 4 or 5 connections from the same ip address.

Chewing up my bandwidth.

Thanks,

john


Googlebot won't eat your bandwidth. It browses 1 page per 3-5 seconds, acting like a regular person browsing your site. I wouldn't worry about it. Just think of it as another regular visitor. Multiple connections is normal. Often a browser will open more than one to download content faster. You would make life difficult for a browsing user if you reduced this.

For example if you limit it to 1 connection and you offer file downloads, as soon as they start downloading a file, they can no longer browse your site and will leave thinking your server has gone down. Not good practice.

My advice is leave it alone unless it becomes a problem (like a DOS/DDoS attack) which are rare anyway.
_________________
Andy (AbyssUnderground) (previously The Inquisitor)
www.abyssunderground.co.uk
Back to top View user's profile Send private message Visit poster's website
jrdeahl
-


Joined: 27 Dec 2006
Posts: 50

PostPosted: Mon Nov 12, 2007 4:31 pm    Post subject: Reply with quote

Last night a googlebot had 14 simultanious connections and was using 45K of bandwidth for 1 1/2 hours.

Today I wake up to the same ip googlebot with 6 connnections and a yahoocrawl with 3 connections. They both are using a steady 41K.

besides having over 3,000 files for download I have a forum. I close4d the forum because the dam bots indexed some pages that were not the main menu. Got a lot of spammers to the point of closing it down.

Hate them dam bots.

Would prefer to be able to stop the bots.
Back to top View user's profile Send private message
Moxxnixx
-


Joined: 21 Jun 2003
Posts: 1226
Location: Florida

PostPosted: Mon Nov 12, 2007 6:16 pm    Post subject: Reply with quote

jrdeahl,
If you want to stop Googlebot completely, add a robot.txt file to your directory.

Code:
User-agent: Googlebot
Disallow: /

Be advised, your search engine ranking will eventually drop if you use this.
Back to top View user's profile Send private message Visit poster's website
AbyssUnderground
-


Joined: 31 Dec 2004
Posts: 3855

PostPosted: Mon Nov 12, 2007 8:29 pm    Post subject: Reply with quote

Googlebot is there to promote your site. It does use a fair amount of bandwidth the first time it index your site. Also forums will be indexed as well. If you don't want it indexed, as said above use a robots.txt.

They won't have been using a steady flow of bandwidth because they only hit your site once in 3-5 seconds.
_________________
Andy (AbyssUnderground) (previously The Inquisitor)
www.abyssunderground.co.uk
Back to top View user's profile Send private message Visit poster's website
jrdeahl
-


Joined: 27 Dec 2006
Posts: 50

PostPosted: Mon Nov 12, 2007 10:32 pm    Post subject: Reply with quote

What the heck are you talking about?

"as said above use a robots.txt." Was not mentioned above!
Back to top View user's profile Send private message
pkSML
-


Joined: 29 May 2006
Posts: 955
Location: Michigan, USA

PostPosted: Mon Nov 12, 2007 10:34 pm    Post subject: Reply with quote

Here's a sure-fire way to get bots off just a certain area of your site.

robots.txt is generally followed, but URL rewriting will guarantee no crawling.

URL rewriting:
  • virtual path regex: ^/no_allow/(.*)
  • condition:
    • variable: HTTP header: user-agent
    • operator: matches with
    • regex: Googlebot
    • case-sensitive: unchecked
  • if rule matches: report an error to the client
  • status code: whatever you want, but maybe a 401 - Unauthorized


Create another rule, just changing the regex for the condition to Yahoo to block Yahoo's crawler.

Several months ago, I had GoogleBot crawling Abyss on my desktop computer, which had access to my entire C drive. Oops!

BTW, I'll put a plugin for my site. Please register your Abyss-powered websites at http://abyss-websites.com
None of the old sites are there anymore due to the complete re-design of the site. Thanks!

_________________
Stephen
Need a LitlURL?


http://CodeBin.yi.org
Back to top View user's profile Send private message Visit poster's website
Moxxnixx
-


Joined: 21 Jun 2003
Posts: 1226
Location: Florida

PostPosted: Mon Nov 12, 2007 10:50 pm    Post subject: Reply with quote

jrdeahl wrote:
What the heck are you talking about?

"as said above use a robots.txt." Was not mentioned above!

Read the post "above" his.
Back to top View user's profile Send private message Visit poster's website
jrdeahl
-


Joined: 27 Dec 2006
Posts: 50

PostPosted: Mon Nov 12, 2007 11:41 pm    Post subject: Reply with quote

That wierd, it wasn't showing up before.
Back to top View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    Aprelium Forum Index -> General Questions All times are GMT + 1 Hour
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB phpBB Group