Robots.txt 拒绝,因为 #!网址 [英] Robots.txt deny, for a #! URL
问题描述
我正在尝试向 robots.txt 文件添加拒绝规则,以拒绝对单个页面的访问.
I am trying to add a deny rule to a robots.txt file, to deny access to a single page.
网站网址的工作方式如下:
The website URLs work as follows:
Javascript 然后根据 URL 换出显示的 DIV.
Javascript then swaps out the DIV that is displayed, based on the URL.
我如何要求搜索引擎蜘蛛不列出以下内容:
How would I request a search engine spider not list the following:
提前致谢
推荐答案
实际上您可以通过多种方式来实现,但这里是最简单的两种.
You can actually do this multiple ways, but here are the two simplest.
您必须排除 Googlebot 将要获取的网址,这不是 AJAX hashbang 值,而是翻译后的 ?_escaped_fragment_=key=value
You have to exclude the URLs that Googlebot is going to fetch, which isn't the AJAX hashbang values, but the instead the translated ?_escaped_fragment_=key=value
在您的 robots.txt 文件中指定:
In your robots.txt file specify:
Disallow: /?_escaped_fragment_=/super-secret
Disallow: /index.php?_escaped_fragment_=/super-secret
如有疑问,您应该始终使用 Google 网站站长工具 » "Googlebot 抓取方式".
When in doubt, you should always use the Google Webmaster Tool » "Fetch As Googlebot".
如果该网页已被 Googlebot 编入索引,则使用 robots.txt 文件不会将其从索引中删除.在应用 robots.txt 后,您要么必须使用 Google 网站管理员工具 URL 删除工具,要么添加一个 noindex
命令到页面,通过 标签或
X-Robots-Tag
在 HTTP 标头中.
If the page has already been indexed by Googlebot, using a robots.txt file won't remove it from the index. You'll either have to use the Google Webmaster Tools URL removal tool after you apply the robots.txt, or instead you can add a noindex
command to the page via a <meta>
tag or X-Robots-Tag
in the HTTP Headers.
它看起来像:
<meta name="ROBOTS" content="NOINDEX, NOFOLLOW" />
或
X-Robots-Tag: noindex
这篇关于Robots.txt 拒绝,因为 #!网址的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!