允许搜索机器人在没有会话ID的情况下抓取您的网站 [英] Allow search bots to crawl your sites without session IDs

查看:59
本文介绍了允许搜索机器人在没有会话ID的情况下抓取您的网站的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Google的网站站长指南状态

允许搜索漫游器在没有会话ID或跟踪其通过网站路径的参数的情况下对您的网站进行爬网.这些技术对于跟踪单个用户的行为很有用,但是漫游器的访问模式是完全不同的.使用这些技术可能会导致您的网站编制索引不完整,因为漫游器可能无法消除看似不同但实际上指向同一页面的URL.

Allow search bots to crawl your sites without session IDs or arguments that track their path through the site. These techniques are useful for tracking individual user behavior, but the access pattern of bots is entirely different. Using these techniques may result in incomplete indexing of your site, as bots may not be able to eliminate URLs that look different but actually point to the same page.

我的ASP.NET 1.1站点使用自定义身份验证/授权,并且在很大程度上依赖于会话向导(类似于

My ASP.NET 1.1 site uses custom authentication/authorization and relies pretty heavily on session guids (similar to this approach). I'm worried that allowing non-session tracked traffic will either break my existing code or introduce security vulnerabilities.

有什么最佳实践可允许非会话跟踪的漫游器抓取正常会话跟踪的网站?除了检查用户代理(我不希望人们伪装成googlebot来绕过我的会话跟踪)之外,还有其他方法可以检测到搜索引擎吗?

What best practices are there for allowing non-session tracked bots to crawl a normally session tracked site? And are there any ways of detecting search bots other than inspecting the user agent (i don't want people to spoof themselves as googlebot to get around my session tracking)?

推荐答案

检测机器人的正确方法是通过主机条目( Dns.GetHostEntry ).一些la脚的机器人要求您按ip地址进行跟踪,但流行的机器人通常不需要.Googlebot请求来自* .googlebot.com.收到主机条目后,应检入 IPHostEntry.AddressList 以确保它包含原始IP地址.

The correct way to detect bots is by host entry (Dns.GetHostEntry). Some lame robots require you to track by ip address, but the popular ones generally don't. Googlebot requests come from *.googlebot.com. After you get the host entry, you should check in the IPHostEntry.AddressList to make sure it contains the original ip address.

在验证机器人时,甚至不看用户代理.

Do not even look at the user agent when verifying robots.

另请参见 http://googlewebmastercentral.blogspot.com/2006/09/how-to-verify-googlebot.html

这篇关于允许搜索机器人在没有会话ID的情况下抓取您的网站的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆