HTTP Basic Auth 用户名中的 UTF-8 字符损坏 [英] UTF-8 characters mangled in HTTP Basic Auth username

查看:19
本文介绍了HTTP Basic Auth 用户名中的 UTF-8 字符损坏的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 Ruby on Rails 构建 Web 服务.用户通过 HTTP Basic Auth 进行身份验证.我想在用户名和密码中允许任何有效的 UTF-8 字符.

问题是浏览器在将基本身份验证凭据中的字符发送到我的服务之前会对其进行处理.为了测试,我使用'カタカナカタカカナカタカナカタカナカタカナカタカナカタカナカタカナ'作为我的用户名(不知道这意味着什么-AFAIK这是我们的一些随机字符)/p>

如果我把它看作一个字符串和做username.unpack(H *")将其转换为十六进制,我得到:3e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a8"这似乎大约为右32个汉字字符(每3字节/6个十六进制数字).

如果我对通过 HTTP 基本身份验证传入的用户名执行相同操作,我会得到:'bafbbaacbafbbaacbafbbaacbafbbaacbafbbaacbafbbaacbafbbaacbafbbaac'.显然要短得多.使用 Firefox Live HTTP Headers 插件,这是发送的实际标头:

授权:基本q7+ryqu/q8qrv6vKq7+ryqu/q8qrv6vKq7+ryqu/q8o6q7+ryqu/q8qrv6vKq7+ryqu/q8qrv6vKq7+ryqu/q8o=

这看起来像'bafbba ...'字符串,高半字节和低半字节交换(至少当我将它粘贴到 Emacs 时,base 64 解码,然后切换到 hexl 模式).这可能是用户名的 UTF16 表示形式,但除了乱码之外,我没有得到任何东西来显示它.

Rails 将 content-type 标头设置为 UTF-8,因此浏览器应该以该编码发送.我得到了正确的表单提交数据.

问题发生在 Firefox 3.0.8 和 IE 7 中.

那么...是否有一些神奇的方法可以让 Web 浏览器通过 HTTP Basic Auth 发送 UTF-8 字符?我在接收端处理错误吗?HTTP Basic Auth 是否不适用于非 ASCII 字符?

解决方案

我想在用户名和密码中允许任何有效的 UTF-8 字符.

放弃所有希望.基本身份验证和 Unicode 不能混用.

对于如何在 base64 化之前将非 ASCII 字符编码为基本身份验证用户名:密码令牌,没有标准 (*).因此,每个浏览器都会做一些不同的事情:

  • Opera 使用 UTF-8;
  • IE 使用系统的默认代码页(您无法知道,除了它从来不是 UTF-8),并使用 Windows 默默地破坏不适合它的字符猜测一个看起来像有点像你想要的,也可能不是'秘方;
  • Mozilla 仅使用字符代码点的低字节,这具有编码为 ISO-8859-1 的效果,并且不可挽回地破坏非 8859-1 字符...除了在执行 XMLHttpRequests 时,在这种情况下,它使用 UTF-8;
  • Safari 和 Chrome 编码为 ISO-8859-1,使用非 8859-1 字符时根本无法发送授权标头.

*:有些人将标准解释为:

  • 它应始终为 ISO-8859-1,因为它是包含直接包含在标头中的原始 8 位字符的默认编码;
  • 应该以某种方式使用 RFC2047 规则对其进行编码.

但是这些提议都不是包含在 base64 编码的身份验证令牌中的主题,而且 HTTP 规范中的 RFC2047 参考确实根本不起作用,因为它可能被使用的所有地方都被明确禁止RFC2047 本身的原子上下文"规则,即使 HTTP 标头遵守 RFC822 家族的规则和扩展,但它们不这样做.

总之:呃.几乎没有希望在标准或 Opera 以外的浏览器中修复此问题.这只是促使人们远离 HTTP 基本身份验证,转而采用非标准和不易访问的基于 cookie 的身份验证方案的另一个因素.真的很丢脸.

I'm trying to build a web service using Ruby on Rails. Users authenticate themselves via HTTP Basic Auth. I want to allow any valid UTF-8 characters in usernames and passwords.

The problem is that the browser is mangling characters in the Basic Auth credentials before it sends them to my service. For testing, I'm using 'カタカナカタカナカタカナカタカナカタカナカタカナカタカナカタカナ' as my username (no idea what it means - AFAIK it's some random characters our QA guy came up with - please forgive me if it is somehow offensive).

If I take that as a string and do username.unpack("h*") to convert it to hex, I get: '3e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a8' That seems about right for 32 kanji characters (3 bytes/6 hex digits per).

If I do the same with the username that's coming in via HTTP Basic auth, I get: 'bafbbaacbafbbaacbafbbaacbafbbaacbafbbaacbafbbaacbafbbaacbafbbaac'. It's obviously much shorter. Using the Firefox Live HTTP Headers plugin, here's the actual header that's being sent:

Authorization: Basic q7+ryqu/q8qrv6vKq7+ryqu/q8qrv6vKq7+ryqu/q8o6q7+ryqu/q8qrv6vKq7+ryqu/q8qrv6vKq7+ryqu/q8o=

That looks like that 'bafbba...' string, with the high and low nibbles swapped (at least when I paste it into Emacs, base 64 decode, then switch to hexl mode). That might be a UTF16 representation of the username, but I haven't gotten anything to display it as anything but gibberish.

Rails is setting the content-type header to UTF-8, so the browser should be sending in that encoding. I get the correct data for form submissions.

The problem happens in both Firefox 3.0.8 and IE 7.

So... is there some magic sauce for getting web browsers to send UTF-8 characters via HTTP Basic Auth? Am I handling things wrong on the receiving end? Does HTTP Basic Auth just not work with non-ASCII characters?

解决方案

I want to allow any valid UTF-8 characters in usernames and passwords.

Abandon all hope. Basic Authentication and Unicode don't mix.

There is no standard(*) for how to encode non-ASCII characters into a Basic Authentication username:password token before base64ing it. Consequently every browser does something different:

  • Opera uses UTF-8;
  • IE uses the system's default codepage (which you have no way of knowing, other than it's never UTF-8), and silently mangles characters that don't fit into to it using the Windows ‘guess a random character that looks a bit like the one you wanted or maybe just not’ secret recipe;
  • Mozilla uses only the lower byte of character codepoints, which has the effect of encoding to ISO-8859-1 and mangling the non-8859-1 characters irretrievably... except when doing XMLHttpRequests, in which case it uses UTF-8;
  • Safari and Chrome encode to ISO-8859-1, and fail to send the authorization header at all when a non-8859-1 character is used.

*: some people interpret the standard to say that either:

  • it should be always ISO-8859-1, due to that being the default encoding for including raw 8-bit characters directly included in headers;
  • it should be encoded using RFC2047 rules, somehow.

But neither of these proposals are on topic for inclusion in a base64-encoded auth token, and the RFC2047 reference in the HTTP spec really doesn't work at all since all the places it might potentially be used are explicitly disallowed by the ‘atom context’ rules of RFC2047 itself, even if HTTP headers honoured the rules and extensions of the RFC822 family, which they don't.

In summary: ugh. There is little-to-no hope of this ever being fixed in the standard or in the browsers other than Opera. It's just one more factor driving people away from HTTP Basic Authentication in favour of non-standard and less-accessible cookie-based authentication schemes. Shame really.

这篇关于HTTP Basic Auth 用户名中的 UTF-8 字符损坏的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆