通过JavaScript检测搜索爬虫 [英] Detect Search Crawlers via JavaScript

查看:128
本文介绍了通过JavaScript检测搜索爬虫的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道如何检测搜索爬虫?我问的原因是因为如果用户代理是机器人我想要禁止某些JavaScript调用。

I am wondering how would I go abouts in detecting search crawlers? The reason I ask is because I want to suppress certain JavaScript calls if the user agent is a bot.

我找到了一个如何检测某个浏览器的例子,但我无法找到如何检测搜索爬虫的示例:

I have found an example of how to to detect a certain browser, but am unable to find examples of how to detect a search crawler:

/ MSIE(\d + \.\ + +); / 。测试(navigator.userAgent的); //测试MSIE xx

我想阻止的搜索爬虫示例:

Example of search crawlers I want to block:

Google 
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) 
Googlebot/2.1 (+http://www.googlebot.com/bot.html) 
Googlebot/2.1 (+http://www.google.com/bot.html) 

Baidu 
Baiduspider+(+http://www.baidu.com/search/spider_jp.html) 
Baiduspider+(+http://www.baidu.com/search/spider.htm) 
BaiDuSpider 


推荐答案

这是正则表达式ruby UA agent_orange 库用于测试 userAgent 看起来像是机器人。您可以通过在此处引用 bot userAgent列表来缩小特定机器人的范围。 :

This is the regex the ruby UA agent_orange library uses to test if a userAgent looks to be a bot. You can narrow it down for specific bots by referencing the bot userAgent list here:

/bot|googlebot|crawler|spider|robot|crawling/i

例如你有一些对象, util.browser ,你可以存储什么类型的设备用户开启:

For example you have some object, util.browser, you can store what type of device a user is on:

util.browser = {
   bot: /bot|googlebot|crawler|spider|robot|crawling/i.test(navigator.userAgent),
   mobile: ...,
   desktop: ...
}

这篇关于通过JavaScript检测搜索爬虫的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆