如何使用 Express 检测 SEO 的网络爬虫? [英] How to detect web crawlers for SEO, using Express?

查看：40 发布时间：2021/6/9 19:12:20 npm web-crawler user-agent

本文介绍了如何使用 Express 检测 SEO 的网络爬虫?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我一直在寻找 npm 包，但它们似乎都没有维护并且依赖于过时的用户代理数据库.是否有可靠且最新的软件包可以帮助我检测爬虫?(主要来自 Google、Facebook、... 用于 SEO)或者如果没有包，我可以自己编写吗?(可能基于最新的用户代理数据库)

I've been searching for npm packages but they all seem unmaintained and rely on the outdated user-agent databases. Is there a reliable and up-to-date package out there that helps me detect crawlers? (mostly from Google, Facebook,... for SEO) or if there's no packages, can I write it myself? (probably based on an up-to-date user-agent database)

更清楚地说，我正在尝试制作一个同构/通用的 React 网站，我希望它被搜索引擎索引，并且它的标题/元数据可以被 Facebook 获取，但我不想预先渲染所有正常请求，以便服务器不会过载，所以我想到的解决方案只是预渲染来自爬虫的请求

To be clearer, I'm trying to make an isomorphic/universal React website and I want it to be indexed by search engines and its title/meta data can be fetched by Facebook, but I don't want to pre-render on all normal requests so that the server is not overloaded, so the solution I'm thinking of is only pre-render for requests from crawlers

推荐答案

我发现的最佳解决方案是 useragent 库，它允许您执行此操作:

The best solution I've found is the useragent library, which allows you to do this:

var useragent = require('useragent');
// for an actual request use: useragent.parse(req.headers['user-agent']);
var agent = useragent.parse('Googlebot-News');

// will log true
console.log(agent.device.toJSON().family === 'Spider')

速度很快，并且保持最新状态.似乎是最好的方法.在浏览器中运行上述脚本:runkit

It is fast and kept up-to-date pretty well. Seems like the best approach. Run the above script in your browser: runkit

这篇关于如何使用 Express 检测 SEO 的网络爬虫?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用 Express 检测 SEO 的网络爬虫? [英] How to detect web crawlers for SEO, using Express?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何使用 Express 检测 SEO 的网络爬虫? [英] How to detect web crawlers for SEO, using Express?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭