这个 Robots.txt 文件是否正确? [英] Is this Robots.txt file correct?

查看：66 发布时间：2021/7/10 19:20:05 yahoo robots.txt googlebot msn

本文介绍了这个 Robots.txt 文件是否正确?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我最近在我的服务器上遇到了很多 CPU 峰值，不知何故，我相信这不是真正的流量，或者它的某些部分不是真实的.所以我现在只想允许谷歌机器人、MSN 和雅虎.请指导我以下 robots.txt 文件是否符合我的要求.

<前>用户代理:Googlebot用户代理:Slurp用户代理:msnbot用户代理:Mediapartners-Google*用户代理:Googlebot-Image用户代理:Yahoo-MMCrawler不允许:用户代理: *不允许:/

谢谢.

解决方案

您的 robots.txt 似乎有效.

允许在一条记录中有多个 User-agent 行.立>
Disallow: 允许抓取所有内容.
以 User-agent: * 开头的记录仅适用于与上一条记录不匹配的机器人.
Disallow:/ 禁止抓取任何东西.

但请注意:只有优秀的机器人才会遵守 robots.txt 中的规则——而且优秀的机器人很可能不会过度使用常见的爬行频率.所以要么你需要提高你的表现，要么怪不那么好的机器人.

I have been getting a lot of CPU spikes recently on my server and somehow I believe it's not the real traffic or some part of it isn't real. So I want to only allow Google bots, MSN and Yahoo for now. Please guide me if the following robots.txt file is correct for my requirement.

User-agent: Googlebot
User-agent: Slurp
User-agent: msnbot 
User-agent: Mediapartners-Google*
User-agent: Googlebot-Image 
User-agent: Yahoo-MMCrawler
Disallow: 

User-agent: *
Disallow: /

Thanks.

解决方案

Your robots.txt seems to be valid.

It is allowed to have several User-agent lines in a record.
Disallow: allows crawling everything.
The record starting with User-agent: * only applies to bots not matched by the previous record.
Disallow: / forbids crawling anything.

But note: Only nice bots follow the rules in robots.txt -- and it’s likely that nice bots don’t overdo common crawling frequencies. So either you need to work on your performance, or not-so-nice bots are to blame.

这篇关于这个 Robots.txt 文件是否正确?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

这个 Robots.txt 文件是否正确? [英] Is this Robots.txt file correct?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

这个 Robots.txt 文件是否正确? [英] Is this Robots.txt file correct?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭