写入数据不是保留编码 [英] Writing data isn't preserving encoding

查看:133
本文介绍了写入数据不是保留编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下字符串:

str <- "ていただけるなら"
Encoding(str) #returns "UTF-8"

我将其写入磁盘:

write.table(str, file="chartest", quote=F, col.names=F, row.names=F)



现在我看看Notepadd ++中的文件,它设置为UTF-8没有BOM编码,

Now I look at the file in Notepadd++, which is set to UTF-8 without BOM encoding, and I get this:

<U+3066><U+3044><U+305F><U+3060><U+3051><U+308B><U+306A><U+3089>

这个过程中出了什么问题?

What is going wrong in this process? I would like the written text file to display the string as it appears in R.

这是在Windows 7,R 2.15版本

This is on Windows 7, R version 2.15

推荐答案

这是一个烦人的功能的R在Windows。到目前为止,我发现的唯一解决方案是临时和程序性地切换到合适的语言环境,以解码所讨论的文本的脚本。

This is an annoying "feature" of R in Windows. The only solution that I have found so far is to temporarily and programatically switch your locale to the appropriate one required to decode the script of the text in question. So, in the above case you would use the Japanese locale.

## This won't work on Windows
str <- "ていただけるなら"
Encoding(str) #returns "UTF-8"
write.table(str, file="c:/chartest.txt", quote=F, col.names=F, row.names=F)
## The following should work on Windows - first grab and save your existing locale
print(Sys.getlocale(category = "LC_CTYPE"))
original_ctype <- Sys.getlocale(category = "LC_CTYPE")
## Switch to the appropriate local for the script
Sys.setlocale("LC_CTYPE","japanese")
## Now you can write your text out and have it look as you would expect
write.table(str, "c:/chartest2.txt", quote = FALSE, col.names = FALSE, 
            row.names = FALSE, sep = "\t", fileEncoding = "UTF-8")
## ...and don't forget to switch back
Sys.setlocale("LC_CTYPE", original_ctype)

上面的代码生成了这个截图中可以看到的两个文件。第一个文件显示Unicode代码点,这不是你想要的,第二个文件显示你通常期望的字形。

The above produces the two files you can see in this screenshot. The first file shows the Unicode code points, which is not what you want, while the second shows the glyphs you would normally expect.

到目前为止没有人能够向我解释为什么会发生在R 。这不是Windows的不可避免的特性,因为Perl,正如我在帖子,以某种方式绕过问题。

So far nobody has been able to explain to me why this happens in R. It is not an unavoidable feature of Windows because Perl, as I mention in this post, gets round the issue somehow.

这篇关于写入数据不是保留编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆