在R中将UTF-8 BOM导出到.csv [英] Export UTF-8 BOM to .csv in R

查看:174
本文介绍了在R中将UTF-8 BOM导出到.csv的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在通过RJDBC从MySQL数据库读取文件,并且它正确显示了R中的所有字母(例如נווהשאנן). 但是,即使使用write.csv和fileEncoding ="UTF-8"导出它,输出看起来也像 <U+0436>.<U+043A>. <U+041B><U+043E><U+0437><U+0435><U+043D><U+0435><U+0446>(在这种情况下,这不是上面的字符串,而是保加利亚语的字符串),用于保加利亚语,希伯来语,中文等.其他特殊字符,例如ã,ç等也可以正常工作.

I am reading a file through RJDBC from a MySQL database and it correctly displays all letters in R (e.g., נווה שאנן). However, even when exporting it using write.csv and fileEncoding="UTF-8" the output looks like <U+0436>.<U+043A>. <U+041B><U+043E><U+0437><U+0435><U+043D><U+0435><U+0446>(in this case this is not the string above but a Bulgarian one) for Bulgarian, Hebrew, Chinese and so on. Other special characters like ã,ç etc work fine.

我怀疑这是由于UTF-8 BOM造成的,但是我没有在网上找到解决方案

I suspect this is because of UTF-8 BOM but I did not find a solution on the net

我的操作系统是德语Windows7.

My OS is a German Windows7.

我尝试过

con<-file("file.csv",encoding="UTF-8")
write.csv(x,con,row.names=FALSE)

和(afaik)等效的write.csv(x, file="file.csv",fileEncoding="UTF-8",row.names=FALSE).

and the (afaik) equivalent write.csv(x, file="file.csv",fileEncoding="UTF-8",row.names=FALSE).

推荐答案

Encoding(help("Encoding"))的帮助页面上,您可以了解特殊编码-bytes.

On help page to Encoding (help("Encoding")) you could read about special encoding - bytes.

使用此方法,我可以通过以下方式生成csv文件:

Using this I was able to generate csv file by:

v <- "נווה שאנן"
X <- data.frame(v1=rep(v,3), v2=LETTERS[1:3], v3=0, stringsAsFactors=FALSE)

Encoding(X$v1) <- "bytes"
write.csv(X, "test.csv", row.names=FALSE)

注意factorcharacter之间的区别.以下应该可以工作:

Take care about differences between factor and character. The following should work:

id_characters <- which(sapply(X,
    function(x) is.character(x) && Encoding(x)=="UTF-8"))
for (i in id_characters) Encoding(X[[i]]) <- "bytes"

id_factors <- which(sapply(X,
    function(x) is.factor(x) && Encoding(levels(x))=="UTF-8"))
for (i in id_factors) Encoding(levels(X[[i]])) <- "bytes"

write.csv(X, "test.csv", row.names=FALSE)

这篇关于在R中将UTF-8 BOM导出到.csv的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆