R中的UTF-8文件输出 [英] UTF-8 file output in R

查看:195
本文介绍了R中的UTF-8文件输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在64位Windows 7上使用R 2.15.0。我想将unicode(CJK)文本输出到文件中。

I'm using R 2.15.0 on Windows 7 64-bit. I would like to output unicode (CJK) text to a file.

下面的代码显示了发送给UTF-8文件连接的Unicode字符如何工作如(I)预期:

The following code shows how a Unicode character sent to write on a UTF-8 file connection does not work as (I) expected:

rty <- file("test.txt",encoding="UTF-8")
write("在", file=rty)
close(rty)
rty <- file("test.txt",encoding="UTF-8")
scan(rty,what=character())
close(rty)

扫描的输出:

Read 1 item 
[1] "<U+5728>"

文件不是用UTF字符本身编写的,而是某种符合ANSI的后备方式。我是否可以使它第一次正确运行(即使用其中带有在的文本文件),还是可以使用一些额外的魔术手段将输出转换为Unicode,并用适当的字符替换代码字符串?

The file was not written with the UTF character itself, but some kind of ANSI-compliant fallback. Can I make it work right the first time (i.e. with a text file that has "在" in it instead), or can I work some extra magic to convert the output to Unicode with the proper character replacing the code string?

谢谢。

[更多信息:在Cygwin,R 2.14中,相同的代码表现正确 .2,而Win7上的2.14.2也已损坏。这是在我这头吗?]

[More info: the same code behaves properly in Cygwin, R 2.14.2, while 2.14.2 on Win7 is also broken. Is this on my end somewhere?]

推荐答案

问题是由于某些 R -Windows的特殊行为(使用默认的系统编码/或使用某些系统写入功能;我不知道具体细节,但实际上是已知的)

The problem is due to some R-Windows special behaviour (using the default system coding / or using some system write functions; I do not know the specifics but the behaviour is actually known)

要在Windows上编写文本UTF8编码,必须在writeLines或readLines之类的函数中使用 useBytes = T 选项:

To write text UTF8 encoding on Windows one has to use the useBytes=T options in functions like writeLines or readLines:

txt <- "在"
writeLines(txt, "test.txt", useBytes=T)

readLines("test.txt", encoding="UTF-8")
[1] "在"

在这里查找 Kevin Ushey 写的非常好的文章: http://kevinushey.github.io/blog/2018/02/21/string-encoding-and-r/ 详细介绍。

Find here a really well written article by Kevin Ushey: http://kevinushey.github.io/blog/2018/02/21/string-encoding-and-r/ going into much more detail.

这篇关于R中的UTF-8文件输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆