阻止包含特定词的 URLS 的谷歌机器人 [英] block google robots for URLS containing a certain word

查看：35 发布时间：2021/7/10 19:18:24 robots.txt

本文介绍了阻止包含特定词的 URLS 的谷歌机器人的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的客户有很多他们不想被谷歌索引的页面——它们都被称为

my client has a load of pages which they dont want indexed by google - they are all called

http://example.com/page-xxx

所以它们是 /page-123 或 /page-2 或 /page-25 等

so they are /page-123 or /page-2 or /page-25 etc

有没有办法阻止谷歌使用 robots.txt 将任何以/page-xxx 开头的页面编入索引

Is there a way to stop google indexing any page that starts with /page-xxx using robots.txt

这样的东西有用吗?

Disallow: /page-*

谢谢

推荐答案

首先，一行写着 Disallow:/post-* 不会做任何事情来阻止抓取/page-xxx"形式的页面.您的意思是在 Disallow 行中添加page"而不是post"?

In the first place, a line that says Disallow: /post-* isn't going to do anything to prevent crawling of pages of the form "/page-xxx". Did you mean to put "page" in your Disallow line, rather than "post"?

Disallow 实质上是禁止以该文本开头的网址".因此，您的示例行将禁止任何以/post-"开头的网址.(也就是说，该文件位于根目录中，其名称以post-"开头.)在这种情况下，星号是多余的，因为它暗示了这一点.

Disallow says, in essence, "disallow urls that start with this text". So your example line will disallow any url that starts with "/post-". (That is, the file is in the root directory and its name starts with "post-".) The asterisk in this case is superfluous, as it's implied.

您的问题不清楚页面在哪里.如果它们都在根目录中，那么简单的 Disallow:/page- 将起作用.如果它们分散在许多不同地方的目录中，那么事情就有点困难了.

Your question is unclear as to where the pages are. If they're all in the root directory, then a simple Disallow: /page- will work. If they're scattered across directories in many different places, then things are a bit more difficult.

正如@user728345 指出的那样，处理这个问题的最简单方法(从 robots.txt 的角度来看)是将所有您不想抓取的页面收集到一个目录中，并禁止对其进行访问.但我理解如果你不能移动所有这些页面.

As @user728345 pointed out, the easiest way (from a robots.txt standpoint) to handle this is to gather all of the pages you don't want crawled into one directory, and disallow access to that. But I understand if you can't move all those pages.

对于 Googlebot 以及其他支持相同通配符语义的机器人(数量惊人，包括我的)，以下应该有效:

For Googlebot specifically, and other bots that support the same wildcard semantics (there are a surprising number of them, including mine), the following should work:

禁止:/*page-

这将匹配任何地方包含page-"的任何内容.但是，这也会阻止诸如/test/thispage-123.html"之类的内容.如果您想防止这种情况发生，那么我认为(我不确定，因为我还没有尝试过)这会起作用:

That will match anything that contains "page-" anywhere. However, that will also block something like "/test/thispage-123.html". If you want to prevent that, then I think (I'm not sure, as I haven't tried it) that this will work:

禁止:*/page-

这篇关于阻止包含特定词的 URLS 的谷歌机器人的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

阻止包含特定词的 URLS 的谷歌机器人 [英] block google robots for URLS containing a certain word

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

阻止包含特定词的 URLS 的谷歌机器人 [英] block google robots for URLS containing a certain word

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭