从R写入UTF-8文件 [英] Write UTF-8 files from R

查看:269
本文介绍了从R写入UTF-8文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尽管R似乎在内部可以很好地处理Unicode字符,但是我无法在R中使用此类UTF-8 Unicode字符输出数据帧.有什么方法可以强制执行此操作吗?

Whereas R seems to handle Unicode characters well internally, I'm not able to output a data frame in R with such UTF-8 Unicode characters. Is there any way to force this?

data.frame(c("hīersumian","ǣmettigan"))->test
write.table(test,"test.txt",row.names=F,col.names=F,quote=F,fileEncoding="UTF-8")

输出文本文件为:

hiersumian <U+01E3>mettigan

我正在Windows环境(Windows 7)中使用R版本3.0.2.

I am using R version 3.0.2 in a Windows environment (Windows 7).

编辑

在答案中建议R在UTF-8中正确写入文件,而问题出在我用来查看文件的软件上.这是我在R中执行所有操作的一些代码.我正在读取以UTF-8编码的文本文件,并且R可以正确读取它.然后R用UTF-8写入文件,然后再次读回,现在正确的Unicode字符消失了.

It's been suggested in the answers that R is writing the file correctly in UTF-8, and that the problem lies with the software I'm using to view the file. Here's some code where I'm doing everything in R. I'm reading in a text file encoded in UTF-8, and R reads it correctly. Then R writes the file out in UTF-8 and reads it back in again, and now the correct Unicode characters are gone.

read.table("myinputfile.txt",encoding="UTF-8")->myinputfile
myinputfile[1,1]
write.table(myinputfile,"myoutputfile.txt",row.names=F,col.names=F,quote=F,fileEncoding="UTF-8")
read.table("myoutputfile.txt",encoding="UTF-8")->myoutputfile
myoutputfile[1,1]

控制台输出:

> read.table("myinputfile.txt",encoding="UTF-8")->myinputfile
> myinputfile[1,1]
[1] hīersumian
Levels: hīersumian ǣmettigan
> write.table(myinputfile,"myoutputfile.txt",row.names=F,col.names=F,quote=F,fileEncoding="UTF-8")
> read.table("myoutputfile.txt",encoding="UTF-8")->myoutputfile
> myoutputfile[1,1]
[1] <U+FEFF>hiersumian
Levels: <U+01E3>mettigan <U+FEFF>hiersumian
> 

推荐答案

此答案"的目的是澄清幕后发生了什么奇怪的事情:

This "answer" serves rather the purpose of clarifying that there is something odd going on behind the scenes:

hīersumian"甚至都没有进入数据帧.在所有情况下,ī"符号都将转换为"i".

"hīersumian" doesn't even make it into the data frame it seems. The "ī"-symbol is in all cases converted to "i".

options("encoding" = "native.enc")
t1 <- data.frame(a = c("hīersumian "), stringsAsFactors=F)
t1
#             a
# 1 hiersumian 

options("encoding" = "UTF-8")
t1 <- data.frame(a = c("hīersumian "), stringsAsFactors=F)
t1
#             a
# 1 hiersumian 

options("encoding" = "UTF-16")
t1 <- data.frame(a = c("hīersumian "), stringsAsFactors=F)
t1
#             a
# 1 hiersumian 

以下序列成功将ǣmettigan"写入文本文件:

The following sequence successfully writes "ǣmettigan" to the text file:

t2 <- data.frame(a = c("ǣmettigan"), stringsAsFactors=F)

getOption("encoding")
# [1] "native.enc"

Encoding(t2[,"a"]) <- "UTF-16"

write.table(t2,"test.txt",row.names=F,col.names=F,quote=F)

它不能与"encoding"作为"UTF-8"或"UTF-16"一起使用,并且还指定"fileEncoding"将导致缺陷或无输出.

It is not going to work with "encoding" as "UTF-8" or "UTF-16" and also specifying "fileEncoding" will either lead to a defect or no output.

到目前为止,我还是有些失望,设法以某种方式解决了所有Unicode问题.

Somewhat disappointing as so far I managed to get all Unicode issues fixed somehow.

这篇关于从R写入UTF-8文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆