控制搜索引擎索引删除 [英] Controlling Search Engine Index Removals

查看:67
本文介绍了控制搜索引擎索引删除的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的网站有一些特定页面:

My site has some particular pages that are:

  1. 已在搜索引擎中编入索引,但我想将它们从索引中删除.
  2. 数量众多,因为它们是动态的(基于查询字符串).
  3. 有点重".(过度热心的机器人会给服务器带来超出我想要的压力.)

因为#2,我只会让它们慢慢自然地去除,但我需要制定一个计划.

Because of #2, I'm just going to let them slowly get removed naturally, but I need to settle on a plan.

我开始做以下事情:

  1. 机器人:在应用程序中使用用户代理检测中止执行,并发送一个基本空白的响应.(我不介意是否有一些机器人通过并呈现真实页面,但我只是阻止了一些常见的.)
  2. 机器人:抛出 403(禁止)响应代码.
  3. 所有客户端:发送X-Robots-Tag: noindex"标头.
  4. 所有客户端:在指向这些页面的链接中添加了 rel="nofollow".
  5. 是否没有禁止机器人访问 robots.txt 中的那些页面.(我认为,如果您从一开始就禁止机器人,或者在这些页面从搜索引擎中完全删除之后,禁止机器人才有用;否则,引擎无法抓取/访问这些页面以发现/honor noindex 标头,因此他们不会删除它们.我提到这一点是因为我认为 robots.txt 可能通常会被误解,并且可能会被建议为不合适的灵丹妙药.)
  1. Bots: Abort execution using user-agent detection in the application, and send a basically blank response. (I don't mind if some bots slip through and render the real page, but I'm just blocking some common ones.)
  2. Bots: Throw a 403 (forbidden) response code.
  3. All clients: Send "X-Robots-Tag: noindex" header.
  4. All clients: Added rel="nofollow" to the links that lead to these pages.
  5. Did not disallow bots to those pages in robots.txt. (I think it's only useful to disallow bots if you do so from the very beginning, or else after those pages are completely removed from search engines; otherwise, engines can't crawl/access those pages to discover/honor the noindex header, so they wouldn't remove them. I mention this because I think robots.txt might commonly be misunderstood, and it might get suggested as an inappropriate silver bullet.)

但是,从那时起,我认为其中一些步骤对我的目标而言要么毫无用处,要么实际上存在问题.

However, since then, I think some of those steps were either fairly useless toward my goal, or actually problematic.

  • 我不确定向机器人发送 403 是否是个好主意.搜索引擎是否看到并完全无视 X-Robots-Tag?让他们回复 200 是否更好?
  • 我认为 rel="nofollow" 只会潜在地影响目标页面排名,而根本不会影响抓取.
  • I'm not sure if throwing a 403 to bots is a good idea. Do the search engines see that and completely disregard the X-Robots-Tag? Is it better to just let them respond 200?
  • I think rel="nofollow" only potentially affects target page rank, and doesn't affect crawling at all.

计划的其余部分似乎没问题(如果我错了,请纠正我),但我不确定大计划中的上述要点.

The rest of the plan seems okay (correct me if I'm wrong), but I'm not sure about the above bullets in the grand scheme.

推荐答案

我认为这是一个很好的计划:

I think this is a good plan:

  1. 机器人:在应用程序中使用用户代理检测中止执行,并发送一个基本空白的响应.(我不介意是否有一些机器人通过并呈现真实页面,但我只是阻止了一些常见的.)
  2. 机器人:发送 410 (Gone) 响应代码.
  1. Bots: Abort execution using user-agent detection in the application, and send a basically blank response. (I don't mind if some bots slip through and render the real page, but I'm just blocking some common ones.)
  2. Bots: Send a 410 (Gone) response code.
一般来说,有时网站管理员会过于关注微小的细节,因此如果页面消失了,可以提供 404,如果你知道它真的消失了,提供 410 很好,"
"In general, sometimes webmasters get a little too caught up in the tiny little details and so if the page is gone, it's fine to serve a 404, if you know it's gone for real it's fine to serve a 410,"

- http://goo.gl/AwJdEz

  • 所有客户端:发送X-Robots-Tag: noindex"标头.我认为这对于获得 410 的已知机器人来说是无关紧要的,但它会涵盖未知引擎的机器人.
  • 所有客户端:将 rel="nofollow" 添加到指向这些页面的链接.这可能不是完全必要的,但不会有什么坏处.
  • 不要不要禁止机器人访问 robots.txt 中的那些页面.(只有从一开始就禁止机器人,或者在这些页面从搜索引擎中完全删除之后,才很有用;否则,引擎无法抓取/访问这些页面以发现/尊重noindex 标头,因此他们不会删除它们.我提到这一点是因为我认为 robots.txt 可能通常会被误解,并且可能会被建议为不合适的灵丹妙药.)
  • - http://goo.gl/AwJdEz

  • All clients: Send "X-Robots-Tag: noindex" header. I think this would be extraneous for the known bots who got the 410, but it would cover unknown engines' bots.
  • All clients: Add rel="nofollow" to the links that lead to these pages. This probably isn't completely necessary, but it wouldn't hurt.
  • Do not disallow bots to those pages in robots.txt. (It's only useful to disallow bots if you do so from the very beginning, or else after those pages are completely removed from search engines; otherwise, engines can't crawl/access those pages to discover/honor the noindex header, so they wouldn't remove them. I mention this because I think robots.txt might commonly be misunderstood, and it might get suggested as an inappropriate silver bullet.)
  • 这篇关于控制搜索引擎索引删除的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆