使用 X-Robot-Tag 代替 robots.txt 有什么优势吗? [英] Is there any advantage of using X-Robot-Tag instead of robots.txt?
问题描述
指示爬虫索引什么和不索引有两种主流解决方案:添加 X-Robot-Tag HTTP 标头,或指示 robots.txt.
It looks like there are two mainstream solutions for instructing crawlers what to index and what not to index: adding an X-Robot-Tag HTTP header, or indicating a robots.txt.
使用前者有什么好处吗?
Is there any advantage to using the former?
推荐答案
使用 robots.txt
您不能禁止对文档编制索引.
With robots.txt
you cannot disallow indexing of your documents.
它们有不同的用途:
robots.txt
可以禁止爬行(使用Disallow
)X-Robots-Tag
¹ 可以禁止索引(使用noindex
)
robots.txt
can disallow crawling (withDisallow
)X-Robots-Tag
¹ can disallow indexing (withnoindex
)
(两者都提供其他不同的功能,例如,链接到您的站点地图在robots.txtcode>、在
X-Robots-Tag
中禁止以下链接等等.)
(And both offer additional different features, e.g., linking to your Sitemap in robots.txt
, disallowing following links in X-Robots-Tag
, and many more.)
爬行意味着访问文档.索引意味着在索引中提供指向文档的链接(以及可能来自或关于文档的元数据).在典型情况下,机器人会在抓取文档后为其编制索引,但这不是必需的.
Crawling means accessing the document. Indexing means providing a link to (and possibly metadata from or about) the document in an index. In the typical case, a bot indexes a document after having crawled it, but that’s not necessary.
不允许抓取文档的机器人仍然可以将其编入索引(从未访问过它).不允许为文档编制索引的机器人可能仍会对其进行抓取.您不能同时禁止.
A bot that isn’t allowed to crawl a document may still index it (without ever accessing it). A bot that isn’t allowed to index a document may still crawl it. You can’t disallow both.
¹ 请注意,标头称为 X-Robots-Tag
,而不是 X-Robot-Tag
.顺便说一下,元数据名称 robots
(对于 HTML meta
元素)是 HTTP 标头的替代.
¹ Note that the header is called X-Robots-Tag
, not X-Robot-Tag
. By the way, the metadata name robots
(for the HTML meta
element) is an alternative to the HTTP header.
这篇关于使用 X-Robot-Tag 代替 robots.txt 有什么优势吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!