禁止将pdf文件编入索引(Robots.txt) [英] Disallow pdf files from indexing (Robots.txt)

查看:190
本文介绍了禁止将pdf文件编入索引(Robots.txt)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有不应该被索引的链接.我需要从Google删除它们.我应该在robots.txt中输入什么 链接示例 http://sitename.com/wp-content /uploads/2014/02/The-Complete-Program-2014.pdf

I have links being indexed that shouldn't. I need to remove them from google. What should I enter to robots.txt Link example http://sitename.com/wp-content/uploads/2014/02/The-Complete-Program-2014.pdf

推荐答案

使用robots.txt,您可以禁止爬网,而不是索引.

With robots.txt, you can disallow crawling, not indexing.

使用robots.txt

With this robots.txt

User-agent: *
Disallow: /wp-content/uploads/2014/02/The-Complete-Program-2014.pdf

任何路径以/wp-content/uploads/2014/02/The-Complete-Program-2014.pdf开头的URL都不被爬网.

any URL whose path starts with /wp-content/uploads/2014/02/The-Complete-Program-2014.pdf is not allowed to be crawled.

但是,如果漫游器以其他某种方式(例如,由其他人链接)找到了该URL,他们可能仍会对其进行 index 索引(无需进行爬网/访问).已经为它建立索引的搜索引擎也是如此:它们可能会保留它(但将不再访问它).

But if a bot finds this URL in some other way (e.g., linked by someone else), they might still index it (without ever crawling/visiting it). The same goes for search engines that already indexed it: they might keep it (but will no longer visit it).

要禁止建立索引,可以将HTTP标头X-Robots-Tagnoindex参数一起使用.在这种情况下,您不应阻止robots.txt中文件的抓取,否则漫游器将永远无法看到您的标头(因此他们永远不会知道您不希望对该文件建立索引).

To disallow indexing, you could use the HTTP header X-Robots-Tag with the noindex parameter. In that case, you should not block crawling of the file in robots.txt, otherwise bots would never be able to see your headers (and so they would never know that you don’t want this file to get indexed).

这篇关于禁止将pdf文件编入索引(Robots.txt)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆