如何判断网络请求是否来自谷歌的爬虫? [英] how to tell if a web request is coming from google's crawler?

查看:39
本文介绍了如何判断网络请求是否来自谷歌的爬虫?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从 HTTP 服务器的角度来看.

From the HTTP server's perspective.

推荐答案

我在我的 asp.net 应用程序中捕获了 google crawler 请求,这里是 google crawler 的签名的样子.

I have captured google crawler request in my asp.net application and here's how the signature of the google crawler looks.

请求 IP:66.249.71.113
客户端:Mozilla/5.0(兼容;Googlebot/2.​​1;+http://www.google.com/bot.html)

Requesting IP: 66.249.71.113
Client: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

我的日志在 66.249.71.* 范围内观察到许多不同的 google 爬虫 IP.所有这些 IP 都位于美国加利福尼亚州山景城.

My logs observe many different IPs for google crawler in 66.249.71.* range. All these IPs are geo-located at Mountain View, CA, USA.

检查请求是否来自 Google 抓取工具的一个不错的解决方案是验证请求是否包含 Googlebothttp://www.google.com/bot.html.正如我所说,在同一个请求客户端上观察到许多 IP,我不建议检查 IP.可能这就是客户身份出现的地方.所以去验证客户身份.

A nice solution to check if the request is coming from Google crawler would be to verify the request to contain Googlebot and http://www.google.com/bot.html. As I said there are many IPs observed with the same requesting client, I'd not recommend to check on IPs. And may be that's where Client identity come into the picture. So go for verifying client identity.

这是 C# 中的示例代码.

Here's a sample code in C#.

    if (Request.UserAgent.ToLower().Contains("googlebot") || 
             Request.UserAgent.ToLower().Contains("google.com/bot.html"))
    {
        //Yes, it's google bot.
    }
    else
    {
        //No, it's something else.
    }

需要注意的是,任何 Http 客户端都可以轻松伪造这一点.

It's important to note that, any Http-client can easily fake this.

这篇关于如何判断网络请求是否来自谷歌的爬虫?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆