这个 Robots.txt 文件是否正确? [英] Is this Robots.txt file correct?

查看:66
本文介绍了这个 Robots.txt 文件是否正确?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近在我的服务器上遇到了很多 CPU 峰值,不知何故,我相信这不是真正的流量,或者它的某些部分不是真实的.所以我现在只想允许谷歌机器人、MSN 和雅虎.请指导我以下 robots.txt 文件是否符合我的要求.

<前>用户代理:Googlebot用户代理:Slurp用户代理:msnbot用户代理:Mediapartners-Google*用户代理:Googlebot-Image用户代理:Yahoo-MMCrawler不允许:用户代理: *不允许:/

谢谢.

解决方案

您的 robots.txt 似乎有效.

但请注意:只有优秀的机器人才会遵守 robots.txt 中的规则——而且优秀的机器人很可能不会过度使用常见的爬行频率.所以要么你需要提高你的表现,要么怪不那么好的机器人.

I have been getting a lot of CPU spikes recently on my server and somehow I believe it's not the real traffic or some part of it isn't real. So I want to only allow Google bots, MSN and Yahoo for now. Please guide me if the following robots.txt file is correct for my requirement.

User-agent: Googlebot
User-agent: Slurp
User-agent: msnbot 
User-agent: Mediapartners-Google*
User-agent: Googlebot-Image 
User-agent: Yahoo-MMCrawler
Disallow: 

User-agent: *
Disallow: /

Thanks.

解决方案

Your robots.txt seems to be valid.

  • It is allowed to have several User-agent lines in a record.
  • Disallow: allows crawling everything.
  • The record starting with User-agent: * only applies to bots not matched by the previous record.
  • Disallow: / forbids crawling anything.

But note: Only nice bots follow the rules in robots.txt -- and it’s likely that nice bots don’t overdo common crawling frequencies. So either you need to work on your performance, or not-so-nice bots are to blame.

这篇关于这个 Robots.txt 文件是否正确?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆