为什么R 3.6.0在评估表达式("Dogs" <"cats")时返回FALSE? [英] Why does R 3.6.0 return FALSE when evaluating the expression ("Dogs" < "cats")?

查看:85
本文介绍了为什么R 3.6.0在评估表达式("Dogs" <"cats")时返回FALSE?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些复杂的代码,但是我没有向您展示这一点,而是要提取问题的本质.

I have some complicated code, but instead of showing you that, I am going to extract the essence of the problem.

评估:"dogs" < "cats"…这应该评估为FALSE,在R 3.6中也是如此.

Evaluate: "dogs" < "cats" … This should evaluate to FALSE and it does in R 3.6.

求值:"Dogs" < "cats"…这应该求值为TRUE,因为"D"的ASCII码为68,而"c"的ASCII码为99. 99,"Dogs" < "cats"应该计算为TRUE,但在R 3.6.0中则不这样.但是,当我尝试使用 https://datacamp.com 网站上的控制台窗口时,表达式"Dogs" < "cats"返回了TRUE和表达式"dogs" < "Cats"返回FALSE-符合预期.

Evaluate: "Dogs" < "cats" … This should evaluate to TRUE because the ASCII code for "D" is 68 and the ASCII code for "c" is 99. Since 68 < 99, "Dogs" < "cats" should evaluate to TRUE, but it does not in R 3.6.0. However, when I tried using the Console window on the https://datacamp.com website, the expression "Dogs" < "cats" returned TRUE and the expression "dogs" < "Cats" returned FALSE - as expected.

因此,我的问题是,为什么R 3.6.0为("Dogs" < "cats")返回FALSE?

Hence, my question is, why does R 3.6.0 return FALSE for ("Dogs" < "cats") ?

推荐答案

DataCamp的解释器显示:

The interpreter at DataCamp shows:

> Sys.getlocale()
[1] "C"

而我的,也许是你的:

> Sys.getlocale()
[1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"

在"C"语言环境下,字符将按其ascii值进行比较,而对于en_US.UTF-8,它们将使用aAbBcC,依此类推.

With the "C" locale, characters are compared by their ascii values, whereas for en_US.UTF-8, they go aAbBcC and so on.

如评论中所述,这在关系运算符的文档中有进一步的解释:

As mentioned in the comments, this is explained further in the documentation for relational operators:

使用正在使用的语言环境的整理顺序,字符向量在字符串中的比较是按字法进行的:请参见语言环境.诸如en_US之类的语言环境的整理顺序通常与C(应使用ASCII)不同,并且可能令人惊讶.注意不要对整理顺序做任何假设:在爱沙尼亚语中,Z介于S和T之间,并且排序规则不一定是逐个字符的-在丹麦语aa中,z后面是单个字母.在威尔士语中,ng可能是也可能不是单个排序单位:如果是,则紧跟在g之后.某些平台可能不尊重语言环境,并且始终以8位语言环境中字节的数字顺序进行排序,而对于UTF-8语言环境则始终以Unicode代码点顺序进行排序(对于相同的语言,可能不会以相同的顺序进行排序).不同的字符集).非字母(空格,标点符号,连字符,分数等)的校对甚至更成问题.

Comparison of strings in character vectors is lexicographic within the strings using the collating sequence of the locale in use: see locales. The collating sequence of locales such as en_US is normally different from C (which should use ASCII) and can be surprising. Beware of making any assumptions about the collation order: e.g. in Estonian Z comes between S and T, and collation is not necessarily character-by-character – in Danish aa sorts as a single letter, after z. In Welsh ng may or may not be a single sorting unit: if it is it follows g. Some platforms may not respect the locale and always sort in numerical order of the bytes in an 8-bit locale, or in Unicode code-point order for a UTF-8 locale (and may not sort in the same order for the same language in different character sets). Collation of non-letters (spaces, punctuation signs, hyphens, fractions and so on) is even more problematic.

这篇关于为什么R 3.6.0在评估表达式("Dogs" <"cats")时返回FALSE?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆