字符向量的R排序规则是什么? [英] What are the R sorting rules of character vectors?

查看:229
本文介绍了字符向量的R排序规则是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

R按我描述为字母而不是ASCII的顺序对字符向量进行排序.

R sorts character vectors in a sequence which I describe as alphabetic, not ASCII.

例如:

sort(c("dog", "Cat", "Dog", "cat"))
[1] "cat" "Cat" "dog" "Dog"

三个问题:

  1. 描述此排序顺序的技术上正确的术语是什么?
  2. 我在CRAN的手册中找不到对此的任何引用.在哪里可以找到有关R中排序规则的描述?
  3. 与其他语言(例如C,Java,Perl或PHP)中的这种行为有什么不同吗?

推荐答案

Details:用于sort()状态:

 The sort order for character vectors will depend on the collating
 sequence of the locale in use: see ‘Comparison’.  The sort order
 for factors is the order of their levels (which is particularly
 appropriate for ordered factors).

help(Comparison)然后显示:

 Comparison of strings in character vectors is lexicographicwithin
 the strings using the collating sequence of the locale in use:see
 ‘locales’.  The collating sequence of locales such as ‘en_US’ is
 normally different from ‘C’ (which should use ASCII) and can be
 surprising.  Beware of making _any_ assumptions about the 
 collation order: e.g. in Estonian ‘Z’ comes between ‘S’ and ‘T’,
 and collation is not necessarily character-by-character - in
 Danish ‘aa’ sorts as a single letter, after ‘z’.  In Welsh ‘ng’
 may or may not be a single sorting unit: if it is it follows ‘g’.
 Some platforms may not respect the locale and always sort in
 numerical order of the bytes in an 8-bit locale, or in Unicode
 point order for a UTF-8 locale (and may not sort in the same order
 for the same language in different character sets).  Collation of
 non-letters (spaces, punctuation signs, hyphens, fractions and so
 on) is even more problematic.

所以这取决于您的语言环境设置.

so it depends on your locale setting.

这篇关于字符向量的R排序规则是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆