尽管已在source()中设置,但未使用UTF-8编码 [英] UTF-8 encoding not used although it is set in source()

查看:263
本文介绍了尽管已在source()中设置,但未使用UTF-8编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不明白这里发生了什么(在Windows平台上使用RStudio):

I don't understand what is going on here (working with RStudio on Windows platform):

保存脚本test_abc.R

a <- "ä"
b <- "ü"
c <- "ö"

然后,运行以下脚本Test.R:

compare_text <- function() {
  l <- list()
  if (a != a2) {
    l[[1]] <- c(a, a2)
  }
  if (b != b2) {
    l[[1]] <- c(b, b2)
  }
  if (c != c2) {
    l[[1]] <- c(c, c2)
  }
}

a <- "ä"
b <- "ü"
c <- "ö"
a2 <- "ä"
b2 <- "ü"
c2 <- "ö"

out_text <- compare_text()
# The next active "source-line" overwrites a, b and c!
source("path2/test2_abc.R") # called "V1" OR
# source("path2/test2_abc.R", encoding = "UTF-8") # called "V2"
out_text2 <- compare_text()
print(out_text)
print(out_text2)

如果您在V1版本中运行脚本test.R,您将获得

If you run the script test.R in version V1 you get

source('~/Desktop/test1.R', encoding = 'UTF-8')
# NULL
# [1] "ö" "ö"

尽管它声明它是使用UTF-8编码运行的. 如果您在版本"V2"中运行脚本test.R,您将得到

although it states that it is run using UTF-8 encoding.
If you run the script test.R in version "V2" you get

source('~/Desktop/test1.R', encoding = 'UTF-8') 
# NULL
# NULL

我不知道这是否与

I don't know whether that related post is helpful.

推荐答案

在V1中,您源文件时未指定该文件的编码(test_abc.R).源代码帮助的编码"部分显示:

In V1 you source a file without specifying the encoding of that file (test_abc.R). The "encoding"-section of source help says:

默认情况下,以R会话的当前编码读取和解析输入.这通常是所需的,但有时需要重新编码,例如如果要在Windows上读取使用UTF-8的系统中的文件(反之亦然).

By default the input is read and parsed in the current encoding of the R session. This is usually what it required, but occasionally re-encoding is needed, e.g. if a file from a UTF-8-using system is to be read on Windows (or vice versa).

无法正确读取"Umlaute",并且功能compare_text返回c(c,c2),因为c!= c2为TRUE.

The "Umlaute" can't be read correctly and function compare_text returns c(c, c2) because c != c2 is TRUE.

在V2中,正确读取了"Umlaute",并且compare_text函数返回null(没有发现差异).

In V2 the "Umlaute" are read correctly and compare_text function returns null (no difference is found).

R本身在源函数中读取文件. R使用操作系统的默认编码.在Windows上,这(主要是?)是"Windows代码页1252",与UTF-8不同.您可以使用Sys.getlocale()在计算机上对其进行测试.这就是为什么您必须告诉R您要来源的文件已编码为UTF-8

It's R itself that reads the file within the source function. R uses the default encoding of the OS. On Windows, this is (mostly?) "Windows code page 1252", which differs from UTF-8. You can test it on your machine with Sys.getlocale(). That's why you have to tell R that the file you want to source is encoded UTF-8

这篇关于尽管已在source()中设置,但未使用UTF-8编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆