禁止网站上的机器人 [英] Ban robots from website
问题描述
我的网站经常停滞不前,因为蜘蛛正在访问许多资源.这就是主持人告诉我的.他们告诉我禁止使用这些IP地址: 46.229.164.98 46.229.164.100 46.229.164.101
my website is often down because a spider is accessying to many resources. This is what the hosting told me. They told me to ban these IP address: 46.229.164.98 46.229.164.100 46.229.164.101
但是我不知道该怎么做.
But I've no idea about how to do this.
我已经用Google搜索了一下,现在将这些行添加到根目录下的.htaccess中:
I've googled a bit and I've now added these lines to .htaccess in the root:
# allow all except those indicated here
<Files *>
order allow,deny
allow from all
deny from 46.229.164.98
deny from 46.229.164.100
deny from 46.229.164.101
</Files>
这100%正确吗?我能做什么? 请帮我.真的,我对应该怎么做一无所知.
Is this 100% correct? What could I do? Please help me. Really I don't have any idea about what I should do.
推荐答案
基于这些
https://www.projecthoneypot.org/ip_46.229.164.98 https://www.projecthoneypot.org/ip_46.229.164.100 https://www.projecthoneypot.org/ip_46.229.164.101
它看起来 就像该机器人是 http://www.semrush.com/bot.html
it looks like the bot is http://www.semrush.com/bot.html
如果那实际上是机器人,他们会在他们的页面上说
if thats actually the robot, in their page they say
To remove our bot from crawling your site simply insert the following lines to your
"robots.txt" file:
User-agent: SemrushBot
Disallow: /
当然,这不能保证机器人会遵守规则.您可以通过几种方式阻止他. .htaccess是其中之一.就像你做到了.
Of course that does not guarantee that the bot will obey the rules. You can block him in several ways. .htaccess is one. Just like you did it.
您还可以执行此小技巧,拒绝用户代理字符串中具有"SemrushBot"的任何ip地址
Also you can do this little trick, deny ANY ip address that has "SemrushBot" in user agent string
Options +FollowSymlinks
RewriteEngine On
RewriteBase /
SetEnvIfNoCase User-Agent "^SemrushBot" bad_user
SetEnvIfNoCase User-Agent "^WhateverElseBadUserAgentHere" bad_user
Deny from env=bad_user
这种方式将阻止该漫游器可能使用的其他IP.
This way will block other IP's that the bot may use.
有关按用户代理字符串阻止的更多信息: https://stackoverflow.com/a/7372572/953684
see more on blocking by user agent string : https://stackoverflow.com/a/7372572/953684
我应该补充一点,如果您的站点被蜘蛛关闭,通常,这意味着您的脚本编写错误或服务器性能很弱.
Should i add, that if your site is down by a spider, usually it means you have a bad-written script or a very weak server.
此行
SetEnvIfNoCase User-Agent "^SemrushBot" bad_user
如果User-Agent 以字符串SemrushBot
开头(插入号^
表示以...开头"),则尝试匹配.如果您想在User-Agent字符串中搜索SemrushBot
ANYWHERE,只需删除插入符号,使其变为:
tries to match if User-Agent begins with the string SemrushBot
(the caret ^
means "beginning with"). if you want to search for let's say SemrushBot
ANYWHERE in the User-Agent string, simply remove the caret so it becomes:
SetEnvIfNoCase User-Agent "SemrushBot" bad_user
上面的意思是如果User-Agent在任何地方都包含字符串SemrushBot
(是的,不需要.*
).
the above means if User-Agent contains the string SemrushBot
anywhere (yes, no need for .*
).
这篇关于禁止网站上的机器人的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!