在R中显示UTF-8编码的汉字 [英] Displaying UTF-8 encoded Chinese characters in R

查看:1534
本文介绍了在R中显示UTF-8编码的汉字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试打开一个UTF-8编码的.csv文件,该文件在R中包含(繁体)汉字.出于某种原因,R有时将信息显示为汉字,有时将其显示为Unicode字符.

I try to open a UTF-8 encoded .csv file that contains (traditional) Chinese characters in R. For some reason, R displays the information sometimes as Chinese characters, sometimes as unicode characters.

例如:

data <-read.csv("mydata.csv", encoding="UTF-8")

data

将产生unicode字符,而:

will produce unicode characters, while:

data <-read.csv("mydata.csv", encoding="UTF-8")

data[,1]

实际上将显示汉字.

如果我将其转换为矩阵,它也会显示汉字,但是如果我尝试查看数据(命令View(data)或fix(data)),它将再次使用unicode.

If I turn it into a matrix, it will also display Chinese characters, but if I try to look at the data (command View(data) or fix(data)) it is in unicode again.

我已经从使用Mac的人(我在使用PC,Windows 7)上征求意见,其中有些人全都使用汉字,而其他人则没有.我试图将原始数据另存为表,并以这种方式将其读入R中-结果相同.我尝试在RStudio,Revolution R和RGui中运行脚本.我试图调整语言环境(例如,中文),但要么R不允许我更改它,要么结果是乱码而不是Unicode字符.

I've asked for advice from people who use a Mac (I'm using a PC, Windows 7), and some of them got Chinese characters throughout, others didn't. I tried to save the original data as a table instead and read it into R this way - same result. I tried running the script in RStudio, Revolution R, and RGui. I tried to adjust the locale (e.g. to chinese), but either R didn't let me change it or else the result was gibberish instead of unicode characters.

我当前的语言环境是:

"LC_COLLATE =法语_瑞士.1252; LC_CTYPE =法语_瑞士.1252; LC_MONETARY =法语_瑞士.1252; LC_NUMERIC = C; LC_TIME =法语_瑞士.1252"

"LC_COLLATE=French_Switzerland.1252;LC_CTYPE=French_Switzerland.1252;LC_MONETARY=French_Switzerland.1252;LC_NUMERIC=C;LC_TIME=French_Switzerland.1252"

对于使R始终显示汉字的任何帮助,将不胜感激...

Any help to get R to consistently display Chinese characters would be greatly appreciated...

推荐答案

不是错误,更多的是在构造data.frame时对基础类型系统转换(character类型和factor类型)的误解.

Not a bug, more a misunderstanding of the underlying type system conversions (the character type and the factor type) when constructing a data.frame.

您可以先从data <-read.csv("mydata.csv", encoding="UTF-8", stringsAsFactors=FALSE)开始,这将使您的汉字成为character类型,因此通过将它们打印出来,您应该会看到期望的文字.

You could start first with data <-read.csv("mydata.csv", encoding="UTF-8", stringsAsFactors=FALSE) which will make your Chinese characters to be of the character type and so by printing them out you should see waht you are expecting.

@nograpes:类似x=c('中華民族');x; y <- data.frame(x, stringsAsFactors=FALSE),一切正常.

@nograpes: similarly x=c('中華民族');x; y <- data.frame(x, stringsAsFactors=FALSE) and everything should be ok.

这篇关于在R中显示UTF-8编码的汉字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆