在R中显示UTF-8编码的汉字 [英] Displaying UTF-8 encoded Chinese characters in R
问题描述
我尝试打开一个UTF-8编码的.csv文件,该文件在R中包含(繁体)汉字.出于某种原因,R有时将信息显示为汉字,有时将其显示为Unicode字符.
I try to open a UTF-8 encoded .csv file that contains (traditional) Chinese characters in R. For some reason, R displays the information sometimes as Chinese characters, sometimes as unicode characters.
例如:
data <-read.csv("mydata.csv", encoding="UTF-8")
data
将产生unicode字符,而:
will produce unicode characters, while:
data <-read.csv("mydata.csv", encoding="UTF-8")
data[,1]
实际上将显示汉字.
如果我将其转换为矩阵,它也会显示汉字,但是如果我尝试查看数据(命令View(data)或fix(data)),它将再次使用unicode.
If I turn it into a matrix, it will also display Chinese characters, but if I try to look at the data (command View(data) or fix(data)) it is in unicode again.
我已经从使用Mac的人(我在使用PC,Windows 7)上征求意见,其中有些人全都使用汉字,而其他人则没有.我试图将原始数据另存为表,并以这种方式将其读入R中-结果相同.我尝试在RStudio,Revolution R和RGui中运行脚本.我试图调整语言环境(例如,中文),但要么R不允许我更改它,要么结果是乱码而不是Unicode字符.
I've asked for advice from people who use a Mac (I'm using a PC, Windows 7), and some of them got Chinese characters throughout, others didn't. I tried to save the original data as a table instead and read it into R this way - same result. I tried running the script in RStudio, Revolution R, and RGui. I tried to adjust the locale (e.g. to chinese), but either R didn't let me change it or else the result was gibberish instead of unicode characters.
我当前的语言环境是:
"LC_COLLATE =法语_瑞士.1252; LC_CTYPE =法语_瑞士.1252; LC_MONETARY =法语_瑞士.1252; LC_NUMERIC = C; LC_TIME =法语_瑞士.1252"
"LC_COLLATE=French_Switzerland.1252;LC_CTYPE=French_Switzerland.1252;LC_MONETARY=French_Switzerland.1252;LC_NUMERIC=C;LC_TIME=French_Switzerland.1252"
对于使R始终显示汉字的任何帮助,将不胜感激...
Any help to get R to consistently display Chinese characters would be greatly appreciated...
推荐答案
不是错误,更多的是在构造data.frame
时对基础类型系统转换(character
类型和factor
类型)的误解.
Not a bug, more a misunderstanding of the underlying type system conversions (the character
type and the factor
type) when constructing a data.frame
.
您可以先从data <-read.csv("mydata.csv", encoding="UTF-8", stringsAsFactors=FALSE)
开始,这将使您的汉字成为character
类型,因此通过将它们打印出来,您应该会看到期望的文字.
You could start first with data <-read.csv("mydata.csv", encoding="UTF-8", stringsAsFactors=FALSE)
which will make your Chinese characters to be of the character
type and so by printing them out you should see waht you are expecting.
@nograpes:类似x=c('中華民族');x; y <- data.frame(x, stringsAsFactors=FALSE)
,一切正常.
@nograpes: similarly x=c('中華民族');x; y <- data.frame(x, stringsAsFactors=FALSE)
and everything should be ok.
这篇关于在R中显示UTF-8编码的汉字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!