如何使用 Express 检测 SEO 的网络爬虫? [英] How to detect web crawlers for SEO, using Express?

查看:40
本文介绍了如何使用 Express 检测 SEO 的网络爬虫?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在寻找 npm 包,但它们似乎都没有维护并且依赖于过时的用户代理数据库.是否有可靠且最新的软件包可以帮助我检测爬虫?(主要来自 Google、Facebook、... 用于 SEO)或者如果没有包,我可以自己编写吗?(可能基于最新的用户代理数据库)

I've been searching for npm packages but they all seem unmaintained and rely on the outdated user-agent databases. Is there a reliable and up-to-date package out there that helps me detect crawlers? (mostly from Google, Facebook,... for SEO) or if there's no packages, can I write it myself? (probably based on an up-to-date user-agent database)

更清楚地说,我正在尝试制作一个同构/通用的 React 网站,我希望它被搜索引擎索引,并且它的标题/元数据可以被 Facebook 获取,但我不想预先渲染所有正常请求,以便服务器不会过载,所以我想到的解决方案只是预渲染来自爬虫的请求

To be clearer, I'm trying to make an isomorphic/universal React website and I want it to be indexed by search engines and its title/meta data can be fetched by Facebook, but I don't want to pre-render on all normal requests so that the server is not overloaded, so the solution I'm thinking of is only pre-render for requests from crawlers

推荐答案

我发现的最佳解决方案是 useragent 库,它允许您执行此操作:

The best solution I've found is the useragent library, which allows you to do this:

var useragent = require('useragent');
// for an actual request use: useragent.parse(req.headers['user-agent']);
var agent = useragent.parse('Googlebot-News');

// will log true
console.log(agent.device.toJSON().family === 'Spider')

速度很快,并且保持最新状态.似乎是最好的方法.在浏览器中运行上述脚本:runkit

It is fast and kept up-to-date pretty well. Seems like the best approach. Run the above script in your browser: runkit

这篇关于如何使用 Express 检测 SEO 的网络爬虫?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆