禁止在 robots.txt 中使用动态网址 [英] Disallow dynamic URL in robots.txt

查看:41
本文介绍了禁止在 robots.txt 中使用动态网址的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们的网址是:

http://example.com/kitchen-knife/collection/maitre-universal-cutting-boards-rana-parsley-chopper-cheese-slicer-vegetables-knife-sharpening-stone-ham-stand-ham-stand-riviera-niza-knives-block-benin.html

我想禁止在 collection 之后抓取 URL,但在 collection 之前有一些类别是动态来的.

I want to disallow URLs to be crawled after collection, but before collection there are categories that are dynamically coming.

如何在 /collection 之后禁止 robots.txt 中的 URL?

How would I disallow URLs in robots.txt after /collection?

推荐答案

这在原始 robots.txt 规范中是不可能的.

This is not possible in the original robots.txt specification.

但是一些 (!) 解析器扩展了规范并定义了通配符(通常是 *).

But some (!) parsers extend the specification and define a wildcard character (typically *).

对于那些解析器,您可以使用:

For those parsers, you could use:

Disallow: /*/collection

* 理解为通配符的解析器将停止抓取路径以anything(可能是nothing)开头的任何URL,后跟<代码>/collection/,然后是任何东西,例如

Parsers that understand * as wildcard will stop crawling any URL whose path starts with anything (which may be nothing), followed by /collection/, followed by anything, e.g.,

http://example.com/foo/collection/
http://example.com/foo/collection/bar
http://example.com/collection/

不将 * 理解为通配符(即,它们遵循原始规范)的解析器将停止抓取路径以 /*/collection/ 开头的任何 URL,例如

Parsers that don’t understand * as wildcard (i.e., they follow the original specification) will stop crawling any URL whose paths starts with /*/collection/, e.g.

http://example.com/*/collection/
http://example.com/*/collection/bar

这篇关于禁止在 robots.txt 中使用动态网址的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆