如何禁止所有动态网址 robots.txt [英] how to disallow all dynamic urls robots.txt

查看:63
本文介绍了如何禁止所有动态网址 robots.txt的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何禁止 robots.txt 中的所有动态网址

how to disallow all dynamic urls in robots.txt

Disallow: /?q=admin/
Disallow: /?q=aggregator/
Disallow: /?q=comment/reply/
Disallow: /?q=contact/
Disallow: /?q=logout/
Disallow: /?q=node/add/
Disallow: /?q=search/
Disallow: /?q=user/password/
Disallow: /?q=user/register/
Disallow: /?q=user/login/

我想禁止所有以/?q= 开头的内容

i want to disallow all things that start with /?q=

推荐答案

问题的答案是使用

Disallow: /?q=

我能找到的关于 robots.txt 的最佳(目前可访问)来源是 维基百科.(所谓的权威来源是 http://www.robotstxt.org,但该网站目前已关闭.)

The best (currently accessible) source on robots.txt I could find is on Wikipedia. (The supposedly definitive source is http://www.robotstxt.org, but site is down at the moment.)

根据维基百科页面,该标准仅定义了两个字段;UserAgent: 和 Disallow:.Disallow: 字段不允许显式通配符,但每个不允许"的路径实际上是一个路径前缀;即匹配任何以指定值开头的路径.

According to the Wikipedia page, the standard defines just two fields; UserAgent: and Disallow:. The Disallow: field does not allow explicit wildcards, but each "disallowed" path is actually a path prefix; i.e. matching any path that starts with the specified value.

Allow: 字段是非标准扩展,Disallow 中对显式通配符的任何支持都将是非标准扩展.如果您使用这些,您就没有权利期望(合法的)网络爬虫会理解它们.

The Allow: field is a non-standard extension, and any support for explicit wildcards in Disallow would be a non-standard extension. If you use these, you have no right to expect that a (legitimate) web crawler will understand them.

这不是爬虫聪明"或愚蠢"的问题:这完全与标准合规性和互操作性有关.例如,任何在Disallow:"中使用显式通配符执行智能"操作的网络爬虫对于(假设的)robots.txt 文件都是不利的,因为这些文件旨在按字面解释这些字符.

This is not a matter of crawlers being "smart" or "dumb": it is all about standards compliance and interoperability. For example, any web crawler that did "smart" things with explicit wildcard characters in a "Disallow:" would be bad for (hypothetical) robots.txt files where those characters were intended to be interpreted literally.

这篇关于如何禁止所有动态网址 robots.txt的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆