如何禁止所有动态网址 robots.txt [英] how to disallow all dynamic urls robots.txt

查看：63 发布时间：2021/7/10 19:17:14 robots.txt

本文介绍了如何禁止所有动态网址 robots.txt的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如何禁止 robots.txt 中的所有动态网址

how to disallow all dynamic urls in robots.txt

Disallow: /?q=admin/
Disallow: /?q=aggregator/
Disallow: /?q=comment/reply/
Disallow: /?q=contact/
Disallow: /?q=logout/
Disallow: /?q=node/add/
Disallow: /?q=search/
Disallow: /?q=user/password/
Disallow: /?q=user/register/
Disallow: /?q=user/login/

我想禁止所有以/?q= 开头的内容

i want to disallow all things that start with /?q=

推荐答案

问题的答案是使用

Disallow: /?q=

我能找到的关于 robots.txt 的最佳(目前可访问)来源是维基百科.(所谓的权威来源是 http://www.robotstxt.org，但该网站目前已关闭.)

The best (currently accessible) source on robots.txt I could find is on Wikipedia. (The supposedly definitive source is http://www.robotstxt.org, but site is down at the moment.)

根据维基百科页面，该标准仅定义了两个字段；UserAgent: 和 Disallow:.Disallow: 字段不允许显式通配符，但每个不允许"的路径实际上是一个路径前缀；即匹配任何以指定值开头的路径.

According to the Wikipedia page, the standard defines just two fields; UserAgent: and Disallow:. The Disallow: field does not allow explicit wildcards, but each "disallowed" path is actually a path prefix; i.e. matching any path that starts with the specified value.

Allow: 字段是非标准扩展，Disallow 中对显式通配符的任何支持都将是非标准扩展.如果您使用这些，您就没有权利期望(合法的)网络爬虫会理解它们.

The Allow: field is a non-standard extension, and any support for explicit wildcards in Disallow would be a non-standard extension. If you use these, you have no right to expect that a (legitimate) web crawler will understand them.

这不是爬虫聪明"或愚蠢"的问题:这完全与标准合规性和互操作性有关.例如，任何在Disallow:"中使用显式通配符执行智能"操作的网络爬虫对于(假设的)robots.txt 文件都是不利的，因为这些文件旨在按字面解释这些字符.

This is not a matter of crawlers being "smart" or "dumb": it is all about standards compliance and interoperability. For example, any web crawler that did "smart" things with explicit wildcard characters in a "Disallow:" would be bad for (hypothetical) robots.txt files where those characters were intended to be interpreted literally.

这篇关于如何禁止所有动态网址 robots.txt的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何禁止所有动态网址 robots.txt [英] how to disallow all dynamic urls robots.txt

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何禁止所有动态网址 robots.txt [英] how to disallow all dynamic urls robots.txt

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭