将Unicode转换为R中的可读字符 [英] Convert unicode to readable characters in R

查看：105 发布时间：2020/7/13 2:48:24 r unicode utf-8

本文介绍了将Unicode转换为R中的可读字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个.csv，其中使用Encoding(data)时编码返回未知"和"UTF-8".文本如下所示:

I have a .csv where the encoding returns "unknown" and "UTF-8" when using Encoding(data). The text looks like this:

<U+1042><U+1040><U+1042><U+1040> <U+1019><U+103D><U+102C>\n\n<U+1010><U+102D><U+102F><U+1004><U+1039><U+1038><U+103B><U+1015><U+100A><U+1039><U+1000><U+102D><U+102F><U+101C><U+1032> <U+1000><U+102C><U+1000><U+103C>

我想将其转换为可读的格式，在这种情况下为缅甸语言，因此看起来有点像这样:

I would like to turn it into a readable format, which in this case is Myanmar language, so something that looks a little like this:

၂၀၂၀မွာတိုင္းျ

奇怪的是，该数据中的文本以前在RStudio中是可读的，但是在某些时候-我不知道什么时候-这种情况发生了变化，现在我只能看到Unicode字符.我已经尝试过这些解决方案没有成功.

Strangely, the text in this data used to be readable in RStudio, but at some point -- I don't know when -- this changed and I can only see the Unicode characters now. I have tried these solutions with no success.

推荐答案

您可以执行以下操作:

library(stringi)

string <- "<U+1042><U+1040><U+1042><U+1040> <U+1019><U+103D><U+102C>\n\n<U+1010><U+102D><U+102F><U+1004><U+1039><U+1038><U+103B><U+1015><U+100A><U+1039><U+1000><U+102D><U+102F><U+101C><U+1032> <U+1000><U+102C><U+1000><U+103C>" 

cat(stri_unescape_unicode(gsub("<U\\+(....)>", "\\\\u\\1", string)))

这将导致:

၂၀၂၀

တိုင်းပြည်ကိုလဲ

တိုင္းျပည္ကိုလဲ ကာကြ

这篇关于将Unicode转换为R中的可读字符的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将Unicode转换为R中的可读字符 [英] Convert unicode to readable characters in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

将Unicode转换为R中的可读字符 [英] Convert unicode to readable characters in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭