十六进制代码（\x）和unicode（\u）字符有什么区别？ [英] What's the difference between hex code (\x) and unicode (\u) chars?

查看：4046 发布时间：2018/6/7 16:42:28 r unicode hex

本文介绍了十六进制代码（\x）和unicode（\u）字符有什么区别？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

> \xnn给定十六进制代码（1或2个十六进制数字）的字符
\ unnnn带给定代码的Unicode字符（1--4十六进制数字）

在Unicode字符只有一个或两个数字的情况下，我希望这些字符是相同的。事实上，？Quotes 帮助页面上的一个示例显示：

\x48\x65\x6c\x6c\x6f\x20\x57\x6f\x72\x6c\x64\x21 ## [1]你好，世界！ \u48\u65\u6c\u6c\u6f\u20\u57\u6f\u72\u6c\u64\u21 ## [1 ] 你好，世界！
然而，在Linux下，当试图打印磅符号时，我看到
cat（\ua3） ## $ cat（\xa3）＃＃
也就是说， \x 十六进制代码无法正确显示。（这种行为持续存在于我尝试过的任何语言环境中。）在Windows 7下，两个版本都显示一个磅符号。

如果我转换为整数并返回，那么井号正确显示在Linux下。
cat（intToUtf8（utf8ToInt（\xa3））） ##£$ b顺便说一下，这在Windows下不起作用，因为 utf8ToInt（\xa3 ）返回 NA 。一些 \x 字符在Windows下返回 NA 但在Linux下引发错误。例如： utf8ToInt（\xf0） ## utf8ToInt错误（\xf0 ）：无效的UTF-8字符串（\uf0 ）这些例子表明 \x 和 \u 字符形式，这似乎是特定于操作系统的，但我无法看到它们如何定义的逻辑。这两个字符形式有什么区别？解决方案转义序列 \xNN 将原始字节 NN 插入到字符串中，而 \ uNN 将Unicode代码点 NN 的UTF-8字节插入到UTF-8字符串中： > charToRaw（'\xA3'） [1] a3 > charToRaw（'\uA3'） [1] c2 a3 这两种类型转义序列不能混合在同一个字符串中： > '\ua3\xa3' 错误：不允许在字符串中混合Unicode和八进制/十六进制转义这是因为转义序列还定义了字符串的编码。 \uNN 序列显式地将整个字符串的编码设置为UTF-8，而 \xNN 将其保留在默认的未知（又名本地）编码中： >编码（'\xa3'） [1]未知 >编码（'\ua3'） [1]UTF-8 在打印字符串时变得很重要，因为它们需要转换成适当的输出编码（例如，您的控制台）。具有已定义编码的字符串可以进行适当的转换（请参阅 enc2native ），但那些带有未知编码的字符串原样输出：在Linux上，您的控制台可能期望使用UTF-8文本，并且 0xA3 不是有效的UTF-8在Windows上，控制台可能需要Windows-1252文本，而 0xA3 是正确的编码为£，这就是你所看到的。（当字符串 \uA3 时，将发生从UTF-8到Windows-1252的转换。）如果显式设置编码，则适当的转换将发生在Linux上： > ; s< - '\xa3' >编码< - 'latin1' > cat（s） £ From ?Quotes: \xnn character with given hex code (1 or 2 hex digits) \unnnn Unicode character with given code (1--4 hex digits) In the case where the Unicode character has only one or two digits, I would expect these characters to be the same. In fact, one of the examples on the ?Quotes help page shows: "\x48\x65\x6c\x6c\x6f\x20\x57\x6f\x72\x6c\x64\x21" ## [1] "Hello World!" "\u48\u65\u6c\u6c\u6f\u20\u57\u6f\u72\u6c\u64\u21" ## [1] "Hello World!" However, under Linux, when trying to print a pound sign, I see cat("\ua3") ## £ cat("\xa3") ## � That is, the \x hex code fails to display correctly. (This behaviour persisted with any locale that I tried.) Under Windows 7 both versions show a pound sign. If I convert to integer and back then the pound sign displays correctly under Linux. cat(intToUtf8(utf8ToInt("\xa3"))) ## £ Incidentally, this doesn't work under Windows, since utf8ToInt("\xa3") returns NA. Some \x characters return NA under Windows but throw an error under Linux. For example: utf8ToInt("\xf0") ## Error in utf8ToInt("\xf0") : invalid UTF-8 string ("\uf0" is a valid character.) These examples show that there are some differences between \x and \u forms of characters, which seem to be OS-specific, but I can't see any logic in how they are defined. What are the difference between these two character forms? 解决方案 The escape sequence \xNN inserts the raw byte NN into a string, whereas \uNN inserts the UTF-8 bytes for the Unicode code point NN into a UTF-8 string: > charToRaw('\xA3') [1] a3 > charToRaw('\uA3') [1] c2 a3 These two types of escape sequence cannot be mixed in the same string: > '\ua3\xa3' Error: mixing Unicode and octal/hex escapes in a string is not allowed This is because the escape sequences also define the encoding of the string. A \uNN sequence explicitly sets the encoding of the entire string to "UTF-8", whereas \xNN leaves it in the default "unknown" (aka. native) encoding: > Encoding('\xa3') [1] "unknown" > Encoding('\ua3') [1] "UTF-8" This becomes important when printing strings, as they need to be converted into the appropriate output encoding (e.g., that of your console). Strings with a defined encoding can be converted appropriately (see enc2native), but those with an "unknown" encoding are simply output as-is: On Linux, your console is probably expecting UTF-8 text, and as 0xA3 is not a valid UTF-8 sequence, it gives you "�". On Windows, your console is probably expecting Windows-1252 text, and as 0xA3 is the correct encoding for "£", that's what you see. (When the string is \uA3, a conversion from UTF-8 to Windows-1252 takes place.) If the encoding is set explicitly, the appropriate conversion will take place on Linux: > s <- '\xa3' > Encoding(s) <- 'latin1' > cat(s) £ 这篇关于十六进制代码（\x）和unicode（\u）字符有什么区别？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

十六进制代码（\x）和unicode（\u）字符有什么区别？ [英] What's the difference between hex code (\x) and unicode (\u) chars?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

十六进制代码（\x）和unicode（\u）字符有什么区别？ [英] What&#39;s the difference between hex code (\x) and unicode (\u) chars?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

十六进制代码（\x）和unicode（\u）字符有什么区别？ [英] What's the difference between hex code (\x) and unicode (\u) chars?

登录关闭