Windows上的R中的UTF-8支持 [英] UTF-8 support in R on Windows
问题描述
由于Windows10上增加了新功能测试版:使用Unicode UTF-8进行全球语言支持",因此我认为R可以将语言环境转换为UTF-8.但是,当我尝试通过以下方式将系统区域设置更改为UTF-8时:
Since new function 'Beta: Use Unicode UTF-8 for worldwide language support' is added on Windows10, I thought it is possible for R to convert locale environment to UTF-8. However, when I try to change system locale to UTF-8 by
Sys.setlocale(locale = "Japanese_Japan.65001")
或
Sys.setlocale(locale = "Japanese_Japan.UTF-8")
我知道
In Sys.setlocale("Japanese_Japan.65001") :
OS reports request to set locale to "Japanese_Japan.65001" cannot be honored
目前,Windows是否允许R使用UTF-8?
For now, does Windows allow R to use UTF-8?
(因为我对语言环境问题不是很熟悉,所以欢迎发表评论,如果有更多信息.)
(Because I am not very familiar with locale problem, I welcome comments if there should be more information.)
信息
> Sys.getlocale()
[1] "LC_COLLATE=Japanese_Japan.932;LC_CTYPE=Japanese_Japan.932;LC_MONETARY=Japanese_Japan.932;LC_NUMERIC=C;LC_TIME=Japanese_Japan.932"
推荐答案
似乎R已构建了实验性二进制文件,该二进制文件完全支持Windows 10上的UTF-8,但由于该项目被标记为实验性",截至2020-07-30,官方结论是:
It appears that R has built experimental binaries that fully support UTF-8 on Windows 10, but since the project was marked as "experimental" as of 2020-07-30 and the official conclusion was:
基于这种经验,我认为切换到UCRT已经成为可能,并且我希望构建完整的工具链需要花费几个月的时间.我认为这是在Windows上的R中可靠地支持Unicode字符(用本机编码无法表示)的唯一现实方法.
Based also on this experience, I believe that switching to UCRT is already possible and I expect that building a complete toolchain should take a small number of months. It is I think the only realistic way to support Unicode characters (not representable in native encoding) reliably in R on Windows.
这显然意味着Windows R上对UTF-8的完全支持仍然是一个遥远的未来.
It clearly means that full UTF-8 support in R on Windows is still a plan for a bit more distant future.
来源: 查看全文