阻止包含特定词的 URLS 的谷歌机器人 [英] block google robots for URLS containing a certain word

查看:35
本文介绍了阻止包含特定词的 URLS 的谷歌机器人的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的客户有很多他们不想被谷歌索引的页面——它们都被称为

my client has a load of pages which they dont want indexed by google - they are all called

http://example.com/page-xxx

所以它们是 /page-123/page-2/page-25

so they are /page-123 or /page-2 or /page-25 etc

有没有办法阻止谷歌使用 robots.txt 将任何以/page-xxx 开头的页面编入索引

Is there a way to stop google indexing any page that starts with /page-xxx using robots.txt

这样的东西有用吗?

Disallow: /page-*

谢谢

推荐答案

首先,一行写着 Disallow:/post-* 不会做任何事情来阻止抓取/page-xxx"形式的页面.您的意思是在 Disallow 行中添加page"而不是post"?

In the first place, a line that says Disallow: /post-* isn't going to do anything to prevent crawling of pages of the form "/page-xxx". Did you mean to put "page" in your Disallow line, rather than "post"?

Disallow 实质上是禁止以该文本开头的网址".因此,您的示例行将禁止任何以/post-"开头的网址.(也就是说,该文件位于根目录中,其名称以post-"开头.)在这种情况下,星号是多余的,因为它暗示了这一点.

Disallow says, in essence, "disallow urls that start with this text". So your example line will disallow any url that starts with "/post-". (That is, the file is in the root directory and its name starts with "post-".) The asterisk in this case is superfluous, as it's implied.

您的问题不清楚页面在哪里.如果它们都在根目录中,那么简单的 Disallow:/page- 将起作用.如果它们分散在许多不同地方的目录中,那么事情就有点困难了.

Your question is unclear as to where the pages are. If they're all in the root directory, then a simple Disallow: /page- will work. If they're scattered across directories in many different places, then things are a bit more difficult.

正如@user728345 指出的那样,处理这个问题的最简单方法(从 robots.txt 的角度来看)是将所有您不想抓取的页面收集到一个目录中,并禁止对其进行访问.但我理解如果你不能移动所有这些页面.

As @user728345 pointed out, the easiest way (from a robots.txt standpoint) to handle this is to gather all of the pages you don't want crawled into one directory, and disallow access to that. But I understand if you can't move all those pages.

对于 Googlebot 以及其他支持相同通配符语义的机器人(数量惊人,包括我的),以下应该有效:

For Googlebot specifically, and other bots that support the same wildcard semantics (there are a surprising number of them, including mine), the following should work:

禁止:/*page-

这将匹配任何地方包含page-"的任何内容.但是,这也会阻止诸如/test/thispage-123.html"之类的内容.如果您想防止这种情况发生,那么我认为(我不确定,因为我还没有尝试过)这会起作用:

That will match anything that contains "page-" anywhere. However, that will also block something like "/test/thispage-123.html". If you want to prevent that, then I think (I'm not sure, as I haven't tried it) that this will work:

禁止:*/page-

这篇关于阻止包含特定词的 URLS 的谷歌机器人的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆