为什么R 3.6.0在评估表达式("Dogs" ＜"cats")时返回FALSE? [英] Why does R 3.6.0 return FALSE when evaluating the expression ("Dogs" < "cats")?

查看：85 发布时间：2020/7/30 19:04:13 r case ascii

本文介绍了为什么R 3.6.0在评估表达式("Dogs" ＜"cats")时返回FALSE?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一些复杂的代码，但是我没有向您展示这一点，而是要提取问题的本质.

I have some complicated code, but instead of showing you that, I am going to extract the essence of the problem.

评估:"dogs" < "cats"…这应该评估为FALSE，在R 3.6中也是如此.

Evaluate: "dogs" < "cats" … This should evaluate to FALSE and it does in R 3.6.

求值:"Dogs" < "cats"…这应该求值为TRUE，因为"D"的ASCII码为68，而"c"的ASCII码为99. 99，"Dogs" < "cats"应该计算为TRUE，但在R 3.6.0中则不这样.但是，当我尝试使用 https://datacamp.com 网站上的控制台窗口时，表达式"Dogs" < "cats"返回了TRUE和表达式"dogs" < "Cats"返回FALSE-符合预期.

Evaluate: "Dogs" < "cats" … This should evaluate to TRUE because the ASCII code for "D" is 68 and the ASCII code for "c" is 99. Since 68 < 99, "Dogs" < "cats" should evaluate to TRUE, but it does not in R 3.6.0. However, when I tried using the Console window on the https://datacamp.com website, the expression "Dogs" < "cats" returned TRUE and the expression "dogs" < "Cats" returned FALSE - as expected.

因此，我的问题是，为什么R 3.6.0为("Dogs" < "cats")返回FALSE?

Hence, my question is, why does R 3.6.0 return FALSE for ("Dogs" < "cats") ?

推荐答案

DataCamp的解释器显示:

The interpreter at DataCamp shows:

> Sys.getlocale()
[1] "C"

而我的，也许是你的:

> Sys.getlocale()
[1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"

在"C"语言环境下，字符将按其ascii值进行比较，而对于en_US.UTF-8，它们将使用aAbBcC，依此类推.

With the "C" locale, characters are compared by their ascii values, whereas for en_US.UTF-8, they go aAbBcC and so on.

如评论中所述，这在关系运算符的文档中有进一步的解释:

As mentioned in the comments, this is explained further in the documentation for relational operators:

使用正在使用的语言环境的整理顺序，字符向量在字符串中的比较是按字法进行的:请参见语言环境.诸如en_US之类的语言环境的整理顺序通常与C(应使用ASCII)不同，并且可能令人惊讶.注意不要对整理顺序做任何假设:在爱沙尼亚语中，Z介于S和T之间，并且排序规则不一定是逐个字符的-在丹麦语aa中，z后面是单个字母.在威尔士语中，ng可能是也可能不是单个排序单位:如果是，则紧跟在g之后.某些平台可能不尊重语言环境，并且始终以8位语言环境中字节的数字顺序进行排序，而对于UTF-8语言环境则始终以Unicode代码点顺序进行排序(对于相同的语言，可能不会以相同的顺序进行排序).不同的字符集).非字母(空格，标点符号，连字符，分数等)的校对甚至更成问题.

Comparison of strings in character vectors is lexicographic within the strings using the collating sequence of the locale in use: see locales. The collating sequence of locales such as en_US is normally different from C (which should use ASCII) and can be surprising. Beware of making any assumptions about the collation order: e.g. in Estonian Z comes between S and T, and collation is not necessarily character-by-character – in Danish aa sorts as a single letter, after z. In Welsh ng may or may not be a single sorting unit: if it is it follows g. Some platforms may not respect the locale and always sort in numerical order of the bytes in an 8-bit locale, or in Unicode code-point order for a UTF-8 locale (and may not sort in the same order for the same language in different character sets). Collation of non-letters (spaces, punctuation signs, hyphens, fractions and so on) is even more problematic.

这篇关于为什么R 3.6.0在评估表达式("Dogs" ＜"cats")时返回FALSE?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

为什么R 3.6.0在评估表达式("Dogs" ＜"cats")时返回FALSE? [英] Why does R 3.6.0 return FALSE when evaluating the expression ("Dogs" < "cats")?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

为什么R 3.6.0在评估表达式("Dogs" ＜"cats")时返回FALSE? [英] Why does R 3.6.0 return FALSE when evaluating the expression (&quot;Dogs&quot; &lt; &quot;cats&quot;)?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

为什么R 3.6.0在评估表达式("Dogs" ＜"cats")时返回FALSE? [英] Why does R 3.6.0 return FALSE when evaluating the expression ("Dogs" < "cats")?

登录关闭