在 Robots.txt 中允许和禁止 [英] Allow and Disallow in Robots.txt

查看:64
本文介绍了在 Robots.txt 中允许和禁止的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

http://www.robotstxt.org/orig.html 说:

禁止:/help 禁止/help.html 和/help/index.html

Disallow: /help disallows both /help.html and /help/index.html

现在,google.com/robots.txt 列出:

Now, google.com/robots.txt lists:

Disallow: /search  
Allow: /search/about  

在运行 robotparser.py 时,它在 Google 的 robots.txt 中为上述两种情况返回 false.

Upon running robotparser.py, it returns false for both the above cases in Googles robots.txt.

有人请解释一下,Allow:/search/aboutAllow 的用途是什么,因为它会返回一个 false 基于上面的 Disallow 条目?

Would somebody please explain me, what's the use of Allow in Allow: /search/about as it would return a false based on the Disallow entry above it?

推荐答案

robotparser 及其 Python 3 对应的模块文档 urllib.robotparser,提到他们使用原始规范.该规范没有 Allow 指令;这是一个非标准扩展.一些主要的抓取工具支持它,但您(显然)不必支持它就可以声明合规性.

The module documentation for robotparser and its Python 3 counterpart, urllib.robotparser, mention that they use the original specification. This specification does not have an Allow directive; that is a non-standard extension. Some major crawlers support it, but you (obviously) don't have to support it to claim compliance.

这篇关于在 Robots.txt 中允许和禁止的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆