在 Robots.txt 中允许和禁止 [英] Allow and Disallow in Robots.txt
问题描述
http://www.robotstxt.org/orig.html 说:
禁止:/help 禁止/help.html 和/help/index.html
Disallow: /help disallows both /help.html and /help/index.html
现在,google.com/robots.txt 列出:
Now, google.com/robots.txt lists:
Disallow: /search
Allow: /search/about
在运行 robotparser.py
时,它在 Google 的 robots.txt
中为上述两种情况返回 false
.
Upon running robotparser.py
, it returns false
for both the above cases in Googles robots.txt
.
有人请解释一下,Allow:/search/about
中 Allow
的用途是什么,因为它会返回一个 false
基于上面的 Disallow
条目?
Would somebody please explain me, what's the use of Allow
in Allow: /search/about
as it would return a false
based on the Disallow
entry above it?
推荐答案
robotparser
及其 Python 3 对应的模块文档 urllib.robotparser
,提到他们使用原始规范.该规范没有 Allow
指令;这是一个非标准扩展.一些主要的抓取工具支持它,但您(显然)不必支持它就可以声明合规性.
The module documentation for robotparser
and its Python 3 counterpart, urllib.robotparser
, mention that they use the original specification. This specification does not have an Allow
directive; that is a non-standard extension. Some major crawlers support it, but you (obviously) don't have to support it to claim compliance.
这篇关于在 Robots.txt 中允许和禁止的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!