我不理解的Linux排序顺序 [英] linux sort order that I don't understand
问题描述
我注意到以下排序输出.谁知道为什么."第一次排在前面,第二次排在末?
I notice the following sort outputs. Who understands why the '.' gets sorted in front the first time and at the end the second time?
我试图调试一个程序,该程序在一个大的已排序文件中查找行,但是罪魁祸首似乎是我对linux sort的期望/理解.
I was trying to debug a program which looks up lines in a large sorted file, but the culprit seems to be my expectation/understanding of linux sort.
$ sort --debug
sort: using ‘en_US.UTF-8’ sorting rules
/mnt/x/E
/mnt/x/.
<ctrl-D>
/mnt/x/.
________
/mnt/x/E
________
$ sort --debug
sort: using ‘en_US.UTF-8’ sorting rules
/mnt/x/Ed
/mnt/x/.T
<ctrl-D>
/mnt/x/Ed
_________
/mnt/x/.T
_________
$
推荐答案
不是那个."在其他字符之前或之后-是根本没有检查过;它纯粹是根据字母字符排序的.
It's not that "." comes before or after other characters - it's that it's not being examined at all; it's sorting purely based on the alphabetic characters.
在第一个示例中,<end-of-string>
在E
之前排序;在第二个示例中,E
排在T
之前.
In your first example, <end-of-string>
sorts before E
; in the second example, E
sorts before T
.
此行为取决于排序规则的语言环境设置.您可以使用诸如LC_COLLATE
:
This behaviour is dependent on the locale settings for collation. You can influence this with environment variables, such as LC_COLLATE
:
$ env LC_COLLATE=C sort
/mnt/x/Ed
/mnt/x/.T
^D
/mnt/x/.T
/mnt/x/Ed
$ env LC_COLLATE=en_US.UTF-8 sort
/mnt/x/Ed
/mnt/x/.T
^D
/mnt/x/Ed
/mnt/x/.T
$
在C
语言环境下,将考虑所有ASCII字符,并按其ASCII顺序对其进行排序.在许多其他语言环境中,标点符号被忽略-这大概是导致您所看到的行为的原因.
Under the C
locale, all ASCII characters are considered, and are sorted in their ASCII order; in many other locales punctuation is ignored - this is presumably what is causing the behaviour you're seeing.
您可以使用locale
命令检查您的语言环境设置.
You can examine your locale settings using the locale
command.
这篇关于我不理解的Linux排序顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!