禁止网站上的机器人 [英] Ban robots from website

查看:149
本文介绍了禁止网站上的机器人的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的网站经常停滞不前,因为蜘蛛正在访问许多资源.这就是主持人告诉我的.他们告诉我禁止使用这些IP地址: 46.229.164.98 46.229.164.100 46.229.164.101

my website is often down because a spider is accessying to many resources. This is what the hosting told me. They told me to ban these IP address: 46.229.164.98 46.229.164.100 46.229.164.101

但是我不知道该怎么做.

But I've no idea about how to do this.

我已经用Google搜索了一下,现在将这些行添加到根目录下的.htaccess中:

I've googled a bit and I've now added these lines to .htaccess in the root:

# allow all except those indicated here
<Files *>
order allow,deny
allow from all
deny from 46.229.164.98
deny from 46.229.164.100
deny from 46.229.164.101
</Files>

这100%正确吗?我能做什么? 请帮我.真的,我对应该怎么做一无所知.

Is this 100% correct? What could I do? Please help me. Really I don't have any idea about what I should do.

推荐答案

基于这些

https://www.projecthoneypot.org/ip_46.229.164.98 https://www.projecthoneypot.org/ip_46.229.164.100 https://www.projecthoneypot.org/ip_46.229.164.101

它看起来 就像该机器人是 http://www.semrush.com/bot.html

it looks like the bot is http://www.semrush.com/bot.html

如果那实际上是机器人,他们会在他们的页面上说

if thats actually the robot, in their page they say

To remove our bot from crawling your site simply insert the following lines to your
"robots.txt" file:

User-agent: SemrushBot
Disallow: /

当然,这不能保证机器人会遵守规则.您可以通过几种方式阻止他. .htaccess是其中之一.就像你做到了.

Of course that does not guarantee that the bot will obey the rules. You can block him in several ways. .htaccess is one. Just like you did it.

您还可以执行此小技巧,拒绝用户代理字符串中具有"SemrushBot"的任何ip地址

Also you can do this little trick, deny ANY ip address that has "SemrushBot" in user agent string

Options +FollowSymlinks  
RewriteEngine On  
RewriteBase /  
SetEnvIfNoCase User-Agent "^SemrushBot" bad_user
SetEnvIfNoCase User-Agent "^WhateverElseBadUserAgentHere" bad_user
Deny from env=bad_user

这种方式将阻止该漫游器可能使用的其他IP.

This way will block other IP's that the bot may use.

有关按用户代理字符串阻止的更多信息: https://stackoverflow.com/a/7372572/953684

see more on blocking by user agent string : https://stackoverflow.com/a/7372572/953684

我应该补充一点,如果您的站点被蜘蛛关闭,通常,这意味着您的脚本编写错误或服务器性能很弱.

Should i add, that if your site is down by a spider, usually it means you have a bad-written script or a very weak server.

此行

SetEnvIfNoCase User-Agent "^SemrushBot" bad_user

如果User-Agent 字符串SemrushBot开头(插入号^表示以...开头"),则尝试匹配.如果您想在User-Agent字符串中搜索SemrushBot ANYWHERE,只需删除插入符号,使其变为:

tries to match if User-Agent begins with the string SemrushBot (the caret ^ means "beginning with"). if you want to search for let's say SemrushBot ANYWHERE in the User-Agent string, simply remove the caret so it becomes:

SetEnvIfNoCase User-Agent "SemrushBot" bad_user

上面的意思是如果User-Agent在任何地方都包含字符串SemrushBot(是的,不需要.*).

the above means if User-Agent contains the string SemrushBot anywhere (yes, no need for .*).

这篇关于禁止网站上的机器人的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆