Google:在robots.txt中禁用某些查询字符串 [英] Google: Disable certain querystring in robots.txt

查看:119
本文介绍了Google:在robots.txt中禁用某些查询字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

http://www.site.com/shop/maxi-dress?colourId=94&optId=694
http://www.site.com/shop/maxi-dress?colourId=94&optId=694&product_type=sale

如上所述,我有成千上万个URL.不同的组合和名称. 我也有这些URL的重复项,这些URL的查询字符串为product_type=sale

I have thousands of URLs like the above. Different combinations and names. I also have duplicates of these URLs which have the query string product_type=sale

我想禁止Google用product_type=sale索引所有内容

I want to disable Google from indexing anything with product_type=sale

在robots.txt中可以做到这一点

Is this possible in robots.txt

推荐答案

Google在robots.txt中支持通配符. robots.txt中的以下指令将阻止Googlebot抓取具有任何参数的任何页面:

Google supports wildcards in robots.txt. The following directive in robots.txt will prevent Googlebot from crawling any page that has any parameters:

Disallow: /*?

这不会阻止其他许多Spider抓取这些URL,因为通配符不是标准robots.txt的一部分.

This won't prevent many other spiders from crawling these URLs because wildcards are not a part of the standard robots.txt.

Google可能会花一些时间从搜索索引中删除您阻止的URL.多余的URL可能仍会被索引几个月.阻止网站管理员后,您可以使用网站管理员工具中的删除网址"功能来加快处理速度.但这是一个手动过程,您必须将其粘贴到要删除的每个单独的URL中.

Google may take its time to remove the URLs that you have blocked from the search index. The extra URLs may still be indexed for months. You can speed the process up by using the "Remove URLs" feature in webmaster tools after they have been blocked. But that is a manual process where you have to paste in each individual URL that you want to have removed.

在Googlbot找不到没有参数的网址版本的情况下,使用此robots.txt规则也可能会损害您网站在Google上的排名.如果您通常使用参数链接到版本,则可能不想在robots.txt中阻止它们.最好使用下面的其他选项之一.

It may also hurt your site's Google rankings to use this robots.txt rule in the case that Googlbot doesn't find the version of the URL without parameters. If you commonly link to the versions with parameters you probably don't want to block them in robots.txt. It would be better to use one of the other options below.

一个更好的选择是在您的每一个上使用 rel canonical meta tag 页面.

A better option is to use the rel canonical meta tag on each of your pages.

因此,您的两个示例URL的头部都应具有以下内容:

So both your example URLs would have the following in the head section:

<link rel="canonical" href="http://www.site.com/shop/maxi-dress">

这告诉Googlebot不要索引页面的这么多变体,而只索引所选URL的规范"版本.与使用robots.txt不同,即使它们使用各种URL参数,Googlebot仍然能够抓取您的所有页面并为其分配值.

That tells Googlebot not to index so many variations of the page, only to index the "canonical" version of the URL that you choose. Unlike using robots.txt, Googlebot will still be able to crawl all your pages and assign value to them, even when they use a variety of URL parameters.

另一种选择是登录 Google网站管理员工具,然后使用抓取"部分.

Another option is to log into Google Webmaster Tools and use the "URL Parameters" feature that is in the "Crawl" section.

在此单击添加参数".您可以将"product_type"设置为不影响页面内容",以使Google不会使用该参数对页面进行爬网和索引.

Once there, click on "Add parameter". You can set "product_type" to "Does not affect page content" so that Google doesn't crawl and index pages with that parameter.

对您使用的每个不会更改页面的参数执行相同的操作.

Do the same for each of the parameters that you use that don't change the page.

这篇关于Google:在robots.txt中禁用某些查询字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆