为什么R 3.6.0在评估表达式("Dogs" <"cats")时返回FALSE? [英] Why does R 3.6.0 return FALSE when evaluating the expression ("Dogs" < "cats")?
问题描述
我有一些复杂的代码,但是我没有向您展示这一点,而是要提取问题的本质.
I have some complicated code, but instead of showing you that, I am going to extract the essence of the problem.
评估:"dogs" < "cats"
…这应该评估为FALSE
,在R 3.6中也是如此.
Evaluate: "dogs" < "cats"
… This should evaluate to FALSE
and it does in R 3.6.
求值:"Dogs" < "cats"
…这应该求值为TRUE
,因为"D"的ASCII码为68,而"c"的ASCII码为99. 99,"Dogs" < "cats"
应该计算为TRUE
,但在R 3.6.0中则不这样.但是,当我尝试使用 https://datacamp.com 网站上的控制台窗口时,表达式"Dogs" < "cats"
返回了TRUE
和表达式"dogs" < "Cats"
返回FALSE
-符合预期.
Evaluate: "Dogs" < "cats"
… This should evaluate to TRUE
because the ASCII code for "D" is 68 and the ASCII code for "c" is 99. Since 68 < 99, "Dogs" < "cats"
should evaluate to TRUE
, but it does not in R 3.6.0. However, when I tried using the Console window on the https://datacamp.com website, the expression "Dogs" < "cats"
returned TRUE
and the expression "dogs" < "Cats"
returned FALSE
- as expected.
因此,我的问题是,为什么R 3.6.0为("Dogs" < "cats"
)返回FALSE
?
Hence, my question is, why does R 3.6.0 return FALSE
for ("Dogs" < "cats"
) ?
推荐答案
DataCamp的解释器显示:
The interpreter at DataCamp shows:
> Sys.getlocale()
[1] "C"
而我的,也许是你的:
> Sys.getlocale()
[1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"
在"C"语言环境下,字符将按其ascii值进行比较,而对于en_US.UTF-8,它们将使用aAbBcC,依此类推.
With the "C" locale, characters are compared by their ascii values, whereas for en_US.UTF-8, they go aAbBcC and so on.
如评论中所述,这在关系运算符的文档中有进一步的解释:
As mentioned in the comments, this is explained further in the documentation for relational operators:
使用正在使用的语言环境的整理顺序,字符向量在字符串中的比较是按字法进行的:请参见语言环境.诸如en_US之类的语言环境的整理顺序通常与C(应使用ASCII)不同,并且可能令人惊讶.注意不要对整理顺序做任何假设:在爱沙尼亚语中,Z介于S和T之间,并且排序规则不一定是逐个字符的-在丹麦语aa中,z后面是单个字母.在威尔士语中,ng可能是也可能不是单个排序单位:如果是,则紧跟在g之后.某些平台可能不尊重语言环境,并且始终以8位语言环境中字节的数字顺序进行排序,而对于UTF-8语言环境则始终以Unicode代码点顺序进行排序(对于相同的语言,可能不会以相同的顺序进行排序).不同的字符集).非字母(空格,标点符号,连字符,分数等)的校对甚至更成问题.
Comparison of strings in character vectors is lexicographic within the strings using the collating sequence of the locale in use: see locales. The collating sequence of locales such as en_US is normally different from C (which should use ASCII) and can be surprising. Beware of making any assumptions about the collation order: e.g. in Estonian Z comes between S and T, and collation is not necessarily character-by-character – in Danish aa sorts as a single letter, after z. In Welsh ng may or may not be a single sorting unit: if it is it follows g. Some platforms may not respect the locale and always sort in numerical order of the bytes in an 8-bit locale, or in Unicode code-point order for a UTF-8 locale (and may not sort in the same order for the same language in different character sets). Collation of non-letters (spaces, punctuation signs, hyphens, fractions and so on) is even more problematic.
这篇关于为什么R 3.6.0在评估表达式("Dogs" <"cats")时返回FALSE?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!