在R中获取允许的语言环境名称的可靠方法是什么? [英] What is a reliable way of getting allowed locale names in R?

查看:97
本文介绍了在R中获取允许的语言环境名称的可靠方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试找到一种可靠的方法来查找要传递给Sys.setlocale的语言环境代码.

I'm trying to find a reliable way of finding locale codes to pass to Sys.setlocale.

?Sys.setlocale帮助页面仅说明允许的值取决于操作系统,并提供以下示例:

The ?Sys.setlocale help page just states that the allowed values are OS dependent, and gives these examples:

Sys.setlocale("LC_TIME", "de")     # Solaris: details are OS-dependent
Sys.setlocale("LC_TIME", "de_DE.utf8")   # Modern Linux etc.
Sys.setlocale("LC_TIME", "de_DE.UTF-8")  # ditto
Sys.setlocale("LC_TIME", "de_DE")  # Mac OS X, in UTF-8
Sys.setlocale("LC_TIME", "German") # Windows

在Linux下,可以使用以下方式检索可能性

Under Linux, the possibilities can be retrieved using

locales <- system("locale -a", intern = TRUE)
##  [1] "C"                    "C.utf8"               "POSIX"               
##  [4] "af_ZA"                "af_ZA.utf8"           "am_ET"
##  ...

我没有Solaris或Mac计算机,但是我想可以使用类似的东西从该计算机生成输出:

I don't have Solaris or Mac machines to hand, but I guess that that output can be generated from that using something like:

library(stringr)
unique(str_split_fixed(locales, "_", 2)[, 1])    #Solaris
unique(str_split_fixed(locales, "\\.", 2)[, 1])  #Mac

Windows上的语言环境存在更多问题:它们要求使用"language_country"形式的长名称,例如:

Locales on Windows are much more problematic: they require long names of the form "language_country", for example:

Sys.setlocale("LC_ALL", "German_Germany")

我找不到Windows下语言环境列表的可靠参考.除非安装了cygwin,否则从Windows命令行调用locale -a失败,然后它返回与Linux下相同的值(我猜它正在访问标准C库中的值.)

I can't find a reliable reference for the list of locales under Windows. Calling locale -a from the Windows command line fails unless cygwin is installed, and then it returns the same values as under Linux (I'm guessing it's accessing values in a standard C library.)

似乎没有与R打包在一起的语言环境列表(我认为可能与share/zoneinfo/zone.tab类似,其中包含时区详细信息).

There doesn't seem to be a list of locales packaged with R (I thought there might something similar to share/zoneinfo/zone.tab that contains time zone details).

我目前的最佳策略是从Microsoft浏览此网页,并通过操作表格的SUBLANG列来形成名称.

My current best strategy is to browse this webpage from Microsoft and form the name by manipulating the SUBLANG column of the table.

http://msdn.microsoft.com/en-us/library/dd318693.aspx

需要一些猜测,例如与SUBLANG_ENGLISH_UK相关的语言环境是English_United Kingdom.

Some guesswork is needed, for example the locale related to SUBLANG_ENGLISH_UK is English_United Kingdom.

Sys.setlocale("LC_ALL", "English_United Kingdom")

如果有不同字母的变体,则需要括号.

Where there are variants in different alphabets, parentheses are needed.

Sys.setlocale("LC_ALL", "Uzbek (Latin)_Uzbekistan")
Sys.setlocale("LC_ALL", "Uzbek (Cyrillic)_Uzbekistan")

这种猜测不会太糟,但是许多语言环境根本不起作用,包括大多数印度语言环境.

This guesswork wouldn't be too bad, but many locales don't work at all, including most Indian locales.

Sys.setlocale("LC_ALL", "Hindi_India")
Sys.setlocale("LC_ALL", "Tamil_India")
Sys.setlocale("LC_ALL", "Sindhi_Pakistan")
Sys.setlocale("LC_ALL", "Nynorsk_Norway")
Sys.setlocale("LC_ALL", "Amharic_Ethiopia")

"Windows区域和语言"对话框(Windows\System32\intl.cpl,请参见图片)具有相似但不相同的可用语言环境列表,但我不知道该语言是从哪里填充的.

The Windows Region and Language dialog box (Windows\System32\intl.cpl, see pic) has a similar but not identical list of available locales, but I don't know where that is populated from.

有几个相关的问题:
1. Mac和Solaris人员:请检查我的获取语言环境的代码是否在您的操作系统下工作.
2.使用Windows的印度/巴基斯坦/挪威/埃塞俄比亚人:请您告诉我Sys.getlocale()为您带来的回报.
3.其他Windows用户:是否有关于可用语言环境的更好文档?

There are several related questions:
1. Mac and Solaris people: please can you check to see if my code for getting locales works under your OS.
2. Indian/Pakistani/Norwegian/Ethiopian people using Windows: Please can you tell me what Sys.getlocale() returns for you.
3. Other Windows people: Is there any better documentation on which locales are available?

更新:单击Ben B提到的问题中的链接后,我偶然发现了这个更好Windows中的语言环境列表.通过使用区域和语言"对话框手动更改区域设置并调用Sys.getlocale(),我推断出Nynorsk是"Norwegian-Nynorsk_Norway".仍然有很多奇怪的地方,例如

Update: After clicking links in the question that Ben B mentioned, I stumbled across this better list of locales in Windows. By manually changing the locale using the Region and Language dialog and calling Sys.getlocale(), I deduced that Nynorsk is "Norwegian-Nynorsk_Norway". There are still many oddities, for example

Sys.setlocale(, "Inuktitut (Latin)_Canada")

很好,但是

Sys.setlocale(, "Inuktitut (Syllabics)_Canada")

失败(与大多数印度语言一样).在任何这些语言环境中启动R都会引起警告,并且R的语言环境将还原为C.

fails (as do most of the Indian languages). Starting R in any of these locales causes a warning, and R's locale to revert to C.

关于您的语言环境,我仍然很想听到任何印度人等的消息.

I'm still interested to hear from any Indians, etc., as to what locale you have.

推荐答案

在回答第一个问题时,这是Mac上的输出:

In answer to your first question, here's the output on my Mac:

> locales <- system("locale -a", intern = TRUE)
> library(stringr)
> unique(str_split_fixed(locales, "\\.", 2)[, 1]) 
 [1] "af_ZA" "am_ET" "be_BY" "bg_BG" "ca_ES" "cs_CZ" "da_DK" "de_AT" "de_CH"
[10] "de_DE" "el_GR" "en_AU" "en_CA" "en_GB" "en_IE" "en_NZ" "en_US" "es_ES"
[19] "et_EE" "eu_ES" "fi_FI" "fr_BE" "fr_CA" "fr_CH" "fr_FR" "he_IL" "hi_IN"
[28] "hr_HR" "hu_HU" "hy_AM" "is_IS" "it_CH" "it_IT" "ja_JP" "kk_KZ" "ko_KR"
[37] "lt_LT" "nl_BE" "nl_NL" "no_NO" "pl_PL" "pt_BR" "pt_PT" "ro_RO" "ru_RU"
[46] "sk_SK" "sl_SI" "sr_YU" "sv_SE" "tr_TR" "uk_UA" "zh_CN" "zh_HK" "zh_TW"
[55] "C"     "POSIX"

我不确定我希望通过Sys.setlocale()看到什么,但是它不会引发任何错误:

I'm not sure what I'm expecting to see with Sys.setlocale() but it doesn't throw any errors:

> Sys.setlocale(locale="he_IL")
[1] "he_IL/he_IL/he_IL/C/he_IL/en_AU.UTF-8"
> Sys.getlocale()
[1] "he_IL/he_IL/he_IL/C/he_IL/en_AU.UTF-8"

这篇关于在R中获取允许的语言环境名称的可靠方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆