编写数据不是保留编码 [英] Writing data isn't preserving encoding

查看:194
本文介绍了编写数据不是保留编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个字符串,如下所示:

  str<  - ていただけるな
编码str)#returnsUTF-8

我将其写入磁盘:

  write.table(str,file =chartest,quote = F,col.names = F,row.names = F)

现在我看看Notepadd ++中的文件,它被设置为没有BOM编码的UTF-8,我得到这个: < U + 305F>< U + 3060>< U + U + 3051>< U + 308B>< U + 306A>< U + 3089>

此过程中出现了什么问题?我希望书面文本文件显示R中出现的字符串。



这是在Windows 7,R版本2.15

解决方案

这是Windows中R的烦人的功能。目前我发现的唯一解决方案是临时和编程地将您的区域设置切换到相关文本的脚本解码所需的相应位置。因此,在上述情况下,您将使用日语区域设置。

  ##这将不适用于Windows 

编码(str)#returnsUTF-8
write.table(str,file =c:/chartest.txt,quote = F,col .names = F,row.names = F)
##以下内容应该在Windows上工作 - 首先抓住并保存现有的区域设置
print(Sys.getlocale(category =LC_CTYPE))
original_ctype< - Sys.getlocale(category =LC_CTYPE)
##切换到脚本的相应本地
Sys.setlocale(LC_CTYPE,japanese)
##现在你可以写出你的文本,看起来像你想要的
write.table(str,c:/chartest2.txt,quote = FALSE,col.names = FALSE,
row.names = FALSE,sep =\t,fileEncoding =UTF-8)
## ...不要忘记切换
Sys.setlocale(LC_CTYPE ,original_ctype)

以上产生了您可以在此截图中看到的两个文件。第一个文件显示Unicode代码点,这不是你想要的,而第二个文件显示你通常希望的字形。





到目前为止,还没有人能够向我解释为什么会发生在R这不是Windows的一个不可避免的特性,因为Perl,正如我在这个的帖子,以某种方式解决问题。


I have a string like the following:

str <- "ていただけるなら"
Encoding(str) #returns "UTF-8"

I write it to disk:

write.table(str, file="chartest", quote=F, col.names=F, row.names=F)

Now I look at the file in Notepadd++, which is set to UTF-8 without BOM encoding, and I get this:

<U+3066><U+3044><U+305F><U+3060><U+3051><U+308B><U+306A><U+3089>

What is going wrong in this process? I would like the written text file to display the string as it appears in R.

This is on Windows 7, R version 2.15

解决方案

This is an annoying "feature" of R in Windows. The only solution that I have found so far is to temporarily and programatically switch your locale to the appropriate one required to decode the script of the text in question. So, in the above case you would use the Japanese locale.

## This won't work on Windows
str <- "ていただけるなら"
Encoding(str) #returns "UTF-8"
write.table(str, file="c:/chartest.txt", quote=F, col.names=F, row.names=F)
## The following should work on Windows - first grab and save your existing locale
print(Sys.getlocale(category = "LC_CTYPE"))
original_ctype <- Sys.getlocale(category = "LC_CTYPE")
## Switch to the appropriate local for the script
Sys.setlocale("LC_CTYPE","japanese")
## Now you can write your text out and have it look as you would expect
write.table(str, "c:/chartest2.txt", quote = FALSE, col.names = FALSE, 
            row.names = FALSE, sep = "\t", fileEncoding = "UTF-8")
## ...and don't forget to switch back
Sys.setlocale("LC_CTYPE", original_ctype)

The above produces the two files you can see in this screenshot. The first file shows the Unicode code points, which is not what you want, while the second shows the glyphs you would normally expect.

So far nobody has been able to explain to me why this happens in R. It is not an unavoidable feature of Windows because Perl, as I mention in this post, gets round the issue somehow.

这篇关于编写数据不是保留编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆