这个 Robots.txt 文件是否正确? [英] Is this Robots.txt file correct?
问题描述
我最近在我的服务器上遇到了很多 CPU 峰值,不知何故,我相信这不是真正的流量,或者它的某些部分不是真实的.所以我现在只想允许谷歌机器人、MSN 和雅虎.请指导我以下 robots.txt 文件是否符合我的要求.
<前>用户代理:Googlebot用户代理:Slurp用户代理:msnbot用户代理:Mediapartners-Google*用户代理:Googlebot-Image用户代理:Yahoo-MMCrawler不允许:用户代理: *不允许:/谢谢.
您的 robots.txt 似乎有效.
- 允许在一条记录中有多个
User-agent
行.立> Disallow:
允许抓取所有内容.- 以
User-agent: *
开头的记录仅适用于与上一条记录不匹配的机器人. Disallow:/
禁止抓取任何东西.
但请注意:只有优秀的机器人才会遵守 robots.txt 中的规则——而且优秀的机器人很可能不会过度使用常见的爬行频率.所以要么你需要提高你的表现,要么怪不那么好的机器人.
I have been getting a lot of CPU spikes recently on my server and somehow I believe it's not the real traffic or some part of it isn't real. So I want to only allow Google bots, MSN and Yahoo for now. Please guide me if the following robots.txt file is correct for my requirement.
User-agent: Googlebot User-agent: Slurp User-agent: msnbot User-agent: Mediapartners-Google* User-agent: Googlebot-Image User-agent: Yahoo-MMCrawler Disallow: User-agent: * Disallow: /
Thanks.
Your robots.txt seems to be valid.
- It is allowed to have several
User-agent
lines in a record. Disallow:
allows crawling everything.- The record starting with
User-agent: *
only applies to bots not matched by the previous record. Disallow: /
forbids crawling anything.
But note: Only nice bots follow the rules in robots.txt -- and it’s likely that nice bots don’t overdo common crawling frequencies. So either you need to work on your performance, or not-so-nice bots are to blame.
这篇关于这个 Robots.txt 文件是否正确?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!