在HTTP Basic Auth用户名中损坏了UTF-8字符 [英] UTF-8 characters mangled in HTTP Basic Auth username

查看:198
本文介绍了在HTTP Basic Auth用户名中损坏了UTF-8字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Ruby on Rails构建Web服务。用户通过HTTP Basic Auth进行身份验证。我想在用户名和密码中允许任何有效的UTF-8字符。



问题是浏览器在将基本身份验证凭据发送给我服务。为了测试,我使用'カタカナカタカナカタカナカタカナカタカナカタカナカタカナカタカナカタカナカタカナ'作为我的用户名(不知道这是什么意思 - AFAIK它是我们的QA家伙想出的一些随机字符 - 请原谅我,如果它有点冒犯)。



如果我把它看作一个字符串,并做username.unpack( H *)将其转换为十六进制,我得到: '3e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a8' 这似乎是对适合32汉字字符(每个3字节/ 6个十六进制数字)。



如果我使用通过HTTP Basic auth进入的用户名执行相同操作,我会得到:
'bafbbaacbafbbaacbafbbaacbafbbaacbafbbaacbafbbaacbafbbaacbafbbaac'。它显然要短得多。使用Firefox Live HTTP Headers插件,这是发送的实际标题:

 授权:基本q7 + ryqu / q8qrv6vKq7 + ryqu / q8qrv6vKq7 + ryqu / q8o6q7 + ryqu / q8qrv6vKq7 + ryqu / q8qrv6vKq7 + ryqu / q8o = 

看起来就像'bafbba ...'字符串一样,交换了高低字节(至少当我将其粘贴到Emacs中时,基本64位解码,然后切换到hexl模式)。这可能是用户名的UTF16表示,但我没有任何东西可以将其显示为除了乱码之外的任何内容。



Rails将内容类型标头设置为UTF -8,所以浏览器应该发送该编码。我得到了表单提交的正确数据。



Firefox 3.0.8和IE 7都会出现问题。



<那么......是否有一些神奇的功能让网络浏览器通过HTTP Basic Auth发送UTF-8字符?我在接收端处理错误了吗? HTTP Basic Auth是否不适用于非ASCII字符?

解决方案


我想允许任何用户名和密码中的有效UTF-8字符。


放弃所有希望。基本身份验证和Unicode不混合。



在base64ing之前,如何将非ASCII字符编码为基本身份验证用户名:密码令牌没有标准(*) 。因此,每个浏览器都会执行不同的操作:




  • Opera使用UTF-8;

  • IE使用系统的默认代码页(除了它永远不会是UTF-8之外你无法知道),并且使用Windows猜测一个看起来有点像你想要的随机字符,然后默默地修改不适合它的字符。也许只是'秘密配方;

  • Mozilla只使用字符代码点的低字节,这会对ISO-8859-1进行编码,并且无法挽回地破坏非8859-1字符... 在执行XMLHttpRequests时除了,在这种情况下它使用UTF-8;

  • Safari和Chrome编码为ISO-8859-1,并且无法发送使用非8859-1字符时的授权标题。



*:有些人解释标准说要么:




  • 它应该始终是ISO-8859-1,因为它是包含原始8-的默认编码直接包含在标题中的位字符;

  • 它应该使用RFC2047规则进行编码,不知何故。



但是这些提案都没有包含在base64编码的auth令牌中,并且HTTP规范中的RFC2047引用实际上根本不起作用,因为'atom可能明确禁止它可能使用的所有地方RFC2047本身的上下文规则,即使HTTP标头符合RFC822系列的规则和扩展,它们也没有。



总结:呃。除了Opera之外,在标准或浏览器中修复这一点几乎没有希望。这只是推动人们远离HTTP基本身份验证的另一个因素,有利于非标准和不易访问的基于cookie的身份验证方案。真惭愧。


I'm trying to build a web service using Ruby on Rails. Users authenticate themselves via HTTP Basic Auth. I want to allow any valid UTF-8 characters in usernames and passwords.

The problem is that the browser is mangling characters in the Basic Auth credentials before it sends them to my service. For testing, I'm using 'カタカナカタカナカタカナカタカナカタカナカタカナカタカナカタカナ' as my username (no idea what it means - AFAIK it's some random characters our QA guy came up with - please forgive me if it is somehow offensive).

If I take that as a string and do username.unpack("h*") to convert it to hex, I get: '3e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a8' That seems about right for 32 kanji characters (3 bytes/6 hex digits per).

If I do the same with the username that's coming in via HTTP Basic auth, I get: 'bafbbaacbafbbaacbafbbaacbafbbaacbafbbaacbafbbaacbafbbaacbafbbaac'. It's obviously much shorter. Using the Firefox Live HTTP Headers plugin, here's the actual header that's being sent:

Authorization: Basic q7+ryqu/q8qrv6vKq7+ryqu/q8qrv6vKq7+ryqu/q8o6q7+ryqu/q8qrv6vKq7+ryqu/q8qrv6vKq7+ryqu/q8o=

That looks like that 'bafbba...' string, with the high and low nibbles swapped (at least when I paste it into Emacs, base 64 decode, then switch to hexl mode). That might be a UTF16 representation of the username, but I haven't gotten anything to display it as anything but gibberish.

Rails is setting the content-type header to UTF-8, so the browser should be sending in that encoding. I get the correct data for form submissions.

The problem happens in both Firefox 3.0.8 and IE 7.

So... is there some magic sauce for getting web browsers to send UTF-8 characters via HTTP Basic Auth? Am I handling things wrong on the receiving end? Does HTTP Basic Auth just not work with non-ASCII characters?

解决方案

I want to allow any valid UTF-8 characters in usernames and passwords.

Abandon all hope. Basic Authentication and Unicode don't mix.

There is no standard(*) for how to encode non-ASCII characters into a Basic Authentication username:password token before base64ing it. Consequently every browser does something different:

  • Opera uses UTF-8;
  • IE uses the system's default codepage (which you have no way of knowing, other than it's never UTF-8), and silently mangles characters that don't fit into to it using the Windows ‘guess a random character that looks a bit like the one you wanted or maybe just not’ secret recipe;
  • Mozilla uses only the lower byte of character codepoints, which has the effect of encoding to ISO-8859-1 and mangling the non-8859-1 characters irretrievably... except when doing XMLHttpRequests, in which case it uses UTF-8;
  • Safari and Chrome encode to ISO-8859-1, and fail to send the authorization header at all when a non-8859-1 character is used.

*: some people interpret the standard to say that either:

  • it should be always ISO-8859-1, due to that being the default encoding for including raw 8-bit characters directly included in headers;
  • it should be encoded using RFC2047 rules, somehow.

But neither of these proposals are on topic for inclusion in a base64-encoded auth token, and the RFC2047 reference in the HTTP spec really doesn't work at all since all the places it might potentially be used are explicitly disallowed by the ‘atom context’ rules of RFC2047 itself, even if HTTP headers honoured the rules and extensions of the RFC822 family, which they don't.

In summary: ugh. There is little-to-no hope of this ever being fixed in the standard or in the browsers other than Opera. It's just one more factor driving people away from HTTP Basic Authentication in favour of non-standard and less-accessible cookie-based authentication schemes. Shame really.

这篇关于在HTTP Basic Auth用户名中损坏了UTF-8字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆