如何使用 Express 检测 SEO 的网络爬虫? [英] How to detect web crawlers for SEO, using Express?
问题描述
我一直在寻找 npm 包,但它们似乎都没有维护并且依赖于过时的用户代理数据库.是否有可靠且最新的软件包可以帮助我检测爬虫?(主要来自 Google、Facebook、... 用于 SEO)或者如果没有包,我可以自己编写吗?(可能基于最新的用户代理数据库)
I've been searching for npm packages but they all seem unmaintained and rely on the outdated user-agent databases. Is there a reliable and up-to-date package out there that helps me detect crawlers? (mostly from Google, Facebook,... for SEO) or if there's no packages, can I write it myself? (probably based on an up-to-date user-agent database)
更清楚地说,我正在尝试制作一个同构/通用的 React 网站,我希望它被搜索引擎索引,并且它的标题/元数据可以被 Facebook 获取,但我不想预先渲染所有正常请求,以便服务器不会过载,所以我想到的解决方案只是预渲染来自爬虫的请求
To be clearer, I'm trying to make an isomorphic/universal React website and I want it to be indexed by search engines and its title/meta data can be fetched by Facebook, but I don't want to pre-render on all normal requests so that the server is not overloaded, so the solution I'm thinking of is only pre-render for requests from crawlers
推荐答案
我发现的最佳解决方案是 useragent 库,它允许您执行此操作:
The best solution I've found is the useragent library, which allows you to do this:
var useragent = require('useragent');
// for an actual request use: useragent.parse(req.headers['user-agent']);
var agent = useragent.parse('Googlebot-News');
// will log true
console.log(agent.device.toJSON().family === 'Spider')
速度很快,并且保持最新状态.似乎是最好的方法.在浏览器中运行上述脚本:runkit
It is fast and kept up-to-date pretty well. Seems like the best approach. Run the above script in your browser: runkit
这篇关于如何使用 Express 检测 SEO 的网络爬虫?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!