这是在 Ruby 中取消转义 unicode 转义序列的最佳方法吗? [英] Is this the best way to unescape unicode escape sequences in Ruby?

查看：38 发布时间：2021/7/11 18:50:41 ruby unicode

本文介绍了这是在 Ruby 中取消转义 unicode 转义序列的最佳方法吗?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一些文本包含像 \u003C 这样的 Unicode 转义序列.这就是我想出的办法来逃避它:

I have some text that contains Unicode escape sequences like \u003C. This is what I came up with to unescape it:

string.gsub(/\u(....)/) {|m|[$1].pack("H*").unpack("n*").pack("U*")}

正确吗?(即它似乎适用于我的测试，但知识渊博的人能否发现它的问题?)

Is it correct? (i.e. it seems to work with my tests, but can someone more knowledgeable find a problem with it?)

推荐答案

您的正则表达式 /\u(....)/ 有一些问题.

Your regex, /\u(....)/, has some problems.

首先，\u 不会像你想象的那样工作，在 1.9 中你会得到一个错误，而在 1.8 中它只会匹配一个 u 而不是您正在寻找的 \u 对；您应该使用 /\\u/ 来查找您想要的文字 \u.

First of all, \u doesn't work the way you think it does, in 1.9 you'll get an error and in 1.8 it will just match a single u rather than the \u pair that you're looking for; you should use /\\u/ to find the literal \u that you want.

其次，您的 (....) 组过于宽松，这将允许任何四个字符通过，这不是您想要的.在 1.9 中，您需要 (\h{4})(四个十六进制数字)，但在 1.8 中，您需要 ([\da-fA-F]{4}) 因为 \h 是一个新事物.

Secondly, your (....) group is much too permissive, that will allow any four characters through and that's not what you want. In 1.9, you want (\h{4}) (four hexadecimal digits) but in 1.8 you'd need ([\da-fA-F]{4}) as \h is a new thing.

因此，如果您希望正则表达式在 1.8 和 1.9 中都能使用，则应该使用 /\\u([\da-fA-F]{4})/.这为您提供了 1.8 和 1.9 中的以下内容:

So if you want your regex to work in both 1.8 and 1.9, you should use /\\u([\da-fA-F]{4})/. This gives you the following in 1.8 and 1.9:

>> s = 'Where is \u03bc pancakes \u03BD house? And u1123!'
=> "Where is \\u03bc pancakes \\u03BD house? And u1123!"
>> s.gsub(/\\u([\da-fA-F]{4})/) {|m| [$1].pack("H*").unpack("n*").pack("U*")}
=> "Where is μ pancakes ν house? And u1123!"

使用pack和 unpack 来破坏将十六进制数字转换为 Unicode 字符可能已经足够了，但可能有更好的方法.

Using pack and unpack to mangle the hex number into a Unicode character is probably good enough but there may be better ways.

这篇关于这是在 Ruby 中取消转义 unicode 转义序列的最佳方法吗?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

这是在 Ruby 中取消转义 unicode 转义序列的最佳方法吗? [英] Is this the best way to unescape unicode escape sequences in Ruby?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

这是在 Ruby 中取消转义 unicode 转义序列的最佳方法吗? [英] Is this the best way to unescape unicode escape sequences in Ruby?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭