通过JavaScript检测搜索爬虫 [英] Detect Search Crawlers via JavaScript
问题描述
我想知道如何检测搜索爬虫?我问的原因是因为如果用户代理是机器人我想要禁止某些JavaScript调用。
I am wondering how would I go abouts in detecting search crawlers? The reason I ask is because I want to suppress certain JavaScript calls if the user agent is a bot.
我找到了一个如何检测某个浏览器的例子,但我无法找到如何检测搜索爬虫的示例:
I have found an example of how to to detect a certain browser, but am unable to find examples of how to detect a search crawler:
/ MSIE(\d + \.\ + +); / 。测试(navigator.userAgent的); //测试MSIE xx
我想阻止的搜索爬虫示例:
Example of search crawlers I want to block:
Google
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Googlebot/2.1 (+http://www.googlebot.com/bot.html)
Googlebot/2.1 (+http://www.google.com/bot.html)
Baidu
Baiduspider+(+http://www.baidu.com/search/spider_jp.html)
Baiduspider+(+http://www.baidu.com/search/spider.htm)
BaiDuSpider
推荐答案
这是正则表达式ruby UA agent_orange
库用于测试 userAgent
看起来像是机器人。您可以通过在此处引用 bot userAgent列表来缩小特定机器人的范围。 :
This is the regex the ruby UA agent_orange
library uses to test if a userAgent
looks to be a bot. You can narrow it down for specific bots by referencing the bot userAgent list here:
/bot|googlebot|crawler|spider|robot|crawling/i
例如你有一些对象, util.browser
,你可以存储什么类型的设备用户开启:
For example you have some object, util.browser
, you can store what type of device a user is on:
util.browser = {
bot: /bot|googlebot|crawler|spider|robot|crawling/i.test(navigator.userAgent),
mobile: ...,
desktop: ...
}
这篇关于通过JavaScript检测搜索爬虫的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!