Facebook的externalhit_autxt机器人缩小网址 [英] Facebook externalhit_uatext robot lowercasing urls

查看:137
本文介绍了Facebook的externalhit_autxt机器人缩小网址的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用一个类似于YouTube的混合网址的网站。我们在服务器上生成ID,我选择了base 62(数字,小写和大写字母),因此它们会更短。因此,网址可能是类似 example.com/user/123AbCaBc 的Facebook机器人似乎正在以全低版本版本示例定期到达我的网站.com / user / 123abcabc 这将导致404错误,因为全部小写ID不在数据库中。

I'm working on a site that has mixed-case urls, similar to youtube. We generate IDs on the server, and I chose base 62 (numbers, lower and uppercase letters) so they would be shorter. So the urls might be something like example.com/user/123AbCaBc The facebook robot seems to be hitting my site regularly with an all-lowercase version example.com/user/123abcabc This causes a 404 error as the all-lowercase ID isn't in the database.

根据日志,没有其他用户代理创建404,所以这是一个机器人,而不是一个人。这是我看到的用户代理:

According to the logs, there aren't other user agents creating 404s, so this is for sure a robot and not a human. Here's the user agent I'm seeing:

facebookexternalhit / 1.1(+ http://www.facebook.com/externalhit_uatext.php)

每4分钟会发生一次。我目前没有记录非404的点击,所以我不知道是否有其他的非小写版本。

This happens about once every 4 minutes. I'm not currently logging non-404 hits, so I'm not sure if there are others to the non-lowercase version.

这里的服务器技术是nodejs / mongodb,但我看不到这是对手头的问题的依赖。

The server tech here is nodejs / mongodb, but I don't see how that is relavant to the issue at hand.

我可以做些什么来修复Facebook?这里有问题,还是应该发出这些日志错误?任何人也有类似的问题?

Is there something I can do to fix facebook? Is there a problem here, or should I squealch these log errors? Anyone else have a similar problem?

推荐答案

可能您节点Webserver应用程序(您正在使用Express?)目前没有不支持字节范围。如下所述,Facebook抓取工具具有以下方式缩小网址的行为:

It's possible that you Node "Webserver application" (are you using Express?) currently doesn't support byte ranges. The Facebook crawler apparantly has the behaviour to fallback on lowercasing the URL as described here:

  • https://mail.habari.co.tz/pipermail/linux/2013-June/000180.html

看看

  • http://derickbailey.com/2014/04/28/check-http-byte-range-request-header-with-nodejs-and-expressjs/
  • http://www.codeproject.com/Articles/813480/HTTP-Partial-Content-In-Node-js

如何解决这个问题。

这篇关于Facebook的externalhit_autxt机器人缩小网址的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆