禁止在 robots.txt 中使用动态网址 [英] Disallow dynamic URL in robots.txt
问题描述
我们的网址是:
http://example.com/kitchen-knife/collection/maitre-universal-cutting-boards-rana-parsley-chopper-cheese-slicer-vegetables-knife-sharpening-stone-ham-stand-ham-stand-riviera-niza-knives-block-benin.html
我想禁止在 collection
之后抓取 URL,但在 collection
之前有一些类别是动态来的.
I want to disallow URLs to be crawled after collection
, but before collection
there are categories that are dynamically coming.
如何在 /collection
之后禁止 robots.txt 中的 URL?
How would I disallow URLs in robots.txt after /collection
?
推荐答案
这在原始 robots.txt 规范中是不可能的.
This is not possible in the original robots.txt specification.
但是一些 (!) 解析器扩展了规范并定义了通配符(通常是 *
).
But some (!) parsers extend the specification and define a wildcard character (typically *
).
对于那些解析器,您可以使用:
For those parsers, you could use:
Disallow: /*/collection
将*
理解为通配符的解析器将停止抓取路径以anything(可能是nothing)开头的任何URL,后跟<代码>/collection/,然后是任何东西,例如
Parsers that understand *
as wildcard will stop crawling any URL whose path starts with anything (which may be nothing), followed by /collection/
, followed by anything, e.g.,
http://example.com/foo/collection/
http://example.com/foo/collection/bar
http://example.com/collection/
不将 *
理解为通配符(即,它们遵循原始规范)的解析器将停止抓取路径以 /*/collection/
开头的任何 URL,例如
Parsers that don’t understand *
as wildcard (i.e., they follow the original specification) will stop crawling any URL whose paths starts with /*/collection/
, e.g.
http://example.com/*/collection/
http://example.com/*/collection/bar
这篇关于禁止在 robots.txt 中使用动态网址的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!