| View previous topic :: View next topic   | 
	
	
	
		| Author | 
		Message | 
	
	
		briosky -
 
  Joined: 18 Jun 2005 Posts: 46 Location: Salt Lake City, UT
  | 
		
			
				 Posted: Wed Sep 21, 2005 5:58 am    Post subject: Garbage in the log | 
				      | 
			 
			
				
  | 
			 
			
				Every darn time that :
 
"msnbot/1.0 (+http://search.msn.com/msnbot.htm)"
 
visits my site, I find about 4 MB of trash in the access.log file.
 
More than tweaking the robots.txt file to lock access to the Redmond geniuses, any idea plz ?
 
 
-- the_wasatch_dude _________________ Brionews - Everyday Freshnews http://brionews.com | 
			 
		  | 
	
	
		| Back to top | 
		
			            | 
		
	
	
		  | 
	
	
		abyssisthebest -
 
  Joined: 30 Jun 2005 Posts: 319 Location: Boston, UK
  | 
		
			
				 Posted: Wed Sep 21, 2005 7:24 am    Post subject:  | 
				      | 
			 
			
				
  | 
			 
			
				find out the ip of the msn bot and add it to the do not log list _________________ My online Portfolio | 
			 
		  | 
	
	
		| Back to top | 
		
			            | 
		
	
	
		  | 
	
	
		briosky -
 
  Joined: 18 Jun 2005 Posts: 46 Location: Salt Lake City, UT
  | 
		
			
				 Posted: Wed Sep 21, 2005 4:57 pm    Post subject:  | 
				      | 
			 
			
				
  | 
			 
			
				 	  | Quote: | 	 		  | find out the ip of the msn bot and add it to the do not log list | 	  
 
abyssisthebest:
 
i can't do that.
 
the log file is processed every day by a stat analyzer perl script (just an example here).
 
that would miss the msn visits.
 
no big deal, of course.
 
but i would rather prefer to discover why the trash appears, then to find a solution
 
however i appreciated the help
 
 _________________ Brionews - Everyday Freshnews http://brionews.com | 
			 
		  | 
	
	
		| Back to top | 
		
			            | 
		
	
	
		  | 
	
	
		TRUSTAbyss -
 
  Joined: 29 Oct 2003 Posts: 3752 Location: USA, GA
  | 
		
			
				 Posted: Wed Sep 21, 2005 6:28 pm    Post subject:  | 
				      | 
			 
			
				
  | 
			 
			
				You will find that lots of Bots visit your website. Every part of your log file is
 
important and you shouldn't worry whats in the log file , you should be glad
 
that its in the file because that tells you that Logging is working. 
 
 
The only thing you should be worried about , are the Error Requests.
 
 
Sincerely , TRUSTpunk | 
			 
		  | 
	
	
		| Back to top | 
		
			           | 
		
	
	
		  | 
	
	
		chance -
 
  Joined: 04 Jan 2003 Posts: 27 Location: everett, wa
  | 
		
			
				 Posted: Sat Oct 22, 2005 7:22 am    Post subject:  | 
				      | 
			 
			
				
  | 
			 
			
				Take another look at AWstats and see how many referrals you get from the msn bot.  I found about none, but the bot was eating up bandwidth everyday.  Since they weren't sending me any hits, I banned them (along with quite a few others who were garbadging up the logs.  
 
 
Robots txt handles the well mannered bots pretty well, but also I use Sygate firewall which has advanced rules to block  ips and ip blocks. | 
			 
		  | 
	
	
		| Back to top | 
		
			           | 
		
	
	
		  | 
	
	
		aprelium -
 
  Joined: 22 Mar 2002 Posts: 6800
 
  | 
		
			
				 Posted: Sat Oct 22, 2005 3:12 pm    Post subject:  | 
				      | 
			 
			
				
  | 
			 
			
				A simple way to keep bots out of your server is to put in the root of your site a robots.txt file.
 
 
For example, if you want to forbid msnbot from crawling your site, the robots.txt file should contain the following:
 
 
 	  | Code: | 	 		  User-agent: msnbot
 
Disallow: / | 	  
 
 
For more information, refer to http://www.robotstxt.org/ . _________________ Support Team
 
Aprelium - http://www.aprelium.com | 
			 
		  | 
	
	
		| Back to top | 
		
			           | 
		
	
	
		  | 
	
	
		 |