将字节编码转换为unicode [英] Convert byte Encoding to unicode

查看：175 发布时间：2017/8/16 21:44:54 r unicode encoding

本文介绍了将字节编码转换为unicode的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我可能没有在标题中使用相应的语言。如果这需要编辑，请随时使用。

I may not be using the appropriate language in the title. If this needs edited please feel free.

我想用byte替换unicode字符并将它们转换为unicode。假设我有：

I want to take a string with "byte" substitutions for unicode characters and convert them back to unicode. Let's say I have:

x <- "bi<df>chen Z<fc>rcher hello world <c6>"

我想回来：

"bißchen Zürcher hello world Æ"

我知道如果我可以得到这个表格，它将根据需要打印到控制台：

I know that if I could get it to this form it would print to the console as desired:

"bi\xdfchen Z\xfcrcher \xc6"

我试过：

gsub("<([[a-z0-9]+)>", "\\x\\1", x)
## [1] "bixdfchen Zxfcrcher xc6"

推荐答案

这个：

x <- "bi<df>chen Z<fc>rcher hello world <c6>"

m <- gregexpr("<[0-9a-f]{2}>", x)
codes <- regmatches(x,m)
chars <- lapply(codes, function(x) {
    rawToChar(as.raw(strtoi(paste0("0x",substr(x,2,3)))), multiple=T)
})
regmatches(x,m) <- chars
x
# [1] "bi\xdfchen Z\xfcrcher hello world \xc6"
Encoding(x) <- "latin1"
x
# [1] "bißchen Zürcher hello world Æ"

请注意，您不能通过将\x粘贴到数字的前端来进行转义的字符。 \x根本不在字符串中。这就是R如何选择在屏幕上表示它。这里使用rawToChar（）将一个数字转换成我们想要的字符。

Note that you can't make an escaped character by pasting a "\x" to the front of a number. That "\x" really isn't in the string at all. It's just how R chooses to represent it on screen. Here use use rawToChar() to turn a number into the character we want.

我在Mac上测试了这个，所以我不得不将编码设置为latin1来查看控制台中的正确符号。只使用像这样的单字节不是正确的UTF-8。

I tested this on a Mac so I had to set the encoding to "latin1" to see the correct symbols in the console. Just using a single byte like that isn't proper UTF-8.

这篇关于将字节编码转换为unicode的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将字节编码转换为unicode [英] Convert byte Encoding to unicode

问题描述

推荐答案

相关文章

开发方法最新文章

热门教程

热门工具

登录关闭

将字节编码转换为unicode [英] Convert byte Encoding to unicode

问题描述

推荐答案

相关文章

开发方法最新文章

热门教程

热门工具

登录 关闭

登录关闭