URI.unescape在尝试转换“％C3％9Fą”时崩溃到“ß” [英] URI.unescape crashes as it is trying to convert "%C3%9Fą" to "ßą"

查看：202 发布时间：2016/11/19 16:26:17 ruby encoding character-encoding crash uri

本文介绍了URI.unescape在尝试转换“％C3％9Fą”时崩溃到“ß”的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用 URI.unescape 来取消剪切文本，不幸的是我遇到了奇怪的错误：

I am using URI.unescape to unescape text, unfortunately I run into weird error:

 # encoding: utf-8
 require('uri')
 URI.unescape("%C3%9Fą")

会导致

 C:/Ruby193/lib/ruby/1.9.1/uri/common.rb:331:in `gsub': incompatible character encodings: ASCII-8BIT and UTF-8 (Encoding::CompatibilityError)
    from C:/Ruby193/lib/ruby/1.9.1/uri/common.rb:331:in `unescape'
    from C:/Ruby193/lib/ruby/1.9.1/uri/common.rb:649:in `unescape'
    from exe/fail.rb:3:in `<main>'

为什么？

推荐答案

URI.unescape 的实现对于非ASCII输入断开。 1.9.3版本如下所示：

The implementation of URI.unescape is broken for non-ASCII inputs. The 1.9.3 version looks like this:

def unescape(str, escaped = @regexp[:ESCAPED])
  str.gsub(escaped) { [$&[1, 2].hex].pack('C') }.force_encoding(str.encoding)
end

正则表达式使用 /％[a-fA-F\d] {2} / 。所以它通过字符串寻找百分号后跟两个十六进制数字;在 $& 中将是匹配的文本（例如'％C3'）和 $& [1,2] 是没有前导百分号的匹配文本（'C3'）。然后我们调用 String＃hex 将该十六进制数转换为Fixnum（ 195 ）并将其包装到数组中（ [195] ），以便我们可以使用 Array＃pack 为我们做字节磨练。问题是 pack 给我们一个二进制字节：

The regex in use is /%[a-fA-F\d]{2}/. So it goes through the string looking for a percent sign followed by two hex digits; in the block $& will be the matched text ('%C3' for example) and $&[1,2] be the matched text without the leading percent sign ('C3'). Then we call String#hex to convert that hexadecimal number to a Fixnum (195) and wrap it in an Array ([195]) so that we can use Array#pack to do the byte mangling for us. The problem is that pack gives us a single binary byte:

> puts [195].pack('C').encoding
ASCII-8BIT

ASCII-8BIT编码也称为二进制（即没有特定编码的纯文本字节）。然后，该块返回该字节，并 String＃gsub 尝试将 str 的UTF-8编码副本插入到 gsub 正在处理，并得到您的错误：

The ASCII-8BIT encoding is also known as "binary" (i.e. plain bytes with no particular encoding). Then the block returns that byte and String#gsub tries to insert into the UTF-8 encoded copy of str that gsub is working on and you get your error:

不兼容的字符编码：ASCII-8BIT和UTF-8（Encoding :: CompatibilityError）

incompatible character encodings: ASCII-8BIT and UTF-8 (Encoding::CompatibilityError)

因为你不能（通常）把二进制字节填充到UTF-8字符串中;您可以随时使用：

because you can't (in general) just stuff binary bytes into a UTF-8 string; you can often get away with it:

URI.unescape("%C3%9F")         # Works
URI.unescape("%C3µ")           # Fails
URI.unescape("µ")              # Works, but nothing to gsub here
URI.unescape("%C3%9Fµ")        # Fails
URI.unescape("%C3%9Fpancakes") # Works

一个简单的解决方法是将字符串切换为二进制，然后尝试对其进行解码：

One simple fix is to switch the string to binary before try to decode it:

def unescape(str, escaped = @regexp[:ESCAPED])
  encoding = str.encoding
  str = str.dup.force_encoding('binary')
  str.gsub(escaped) { [$&[1, 2].hex].pack('C') }.force_encoding(encoding)
end

另一个选择是将 force_encoding / p>

Another option is to push the force_encoding into the block:

def unescape(str, escaped = @regexp[:ESCAPED])
  str.gsub(escaped) { [$&[1, 2].hex].pack('C').force_encoding(encoding) }
end

我不知道为什么 gsub 在某些情况下失败，但在其他情况下却成功。

I'm not sure why the gsub fails in some cases but succeeds in others.

这篇关于URI.unescape在尝试转换“％C3％9Fą”时崩溃到“ß”的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

URI.unescape在尝试转换“％C3％9Fą”时崩溃到“ß” [英] URI.unescape crashes as it is trying to convert "%C3%9Fą" to "ßą"

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

URI.unescape在尝试转换“％C3％9Fą”时崩溃到“ß” [英] URI.unescape crashes as it is trying to convert &quot;%C3%9Fą&quot; to &quot;&#223;ą&quot;

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

URI.unescape在尝试转换“％C3％9Fą”时崩溃到“ß” [英] URI.unescape crashes as it is trying to convert "%C3%9Fą" to "ßą"

登录关闭