我不理解的Linux排序顺序 [英] linux sort order that I don't understand

查看:80
本文介绍了我不理解的Linux排序顺序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我注意到以下排序输出.谁知道为什么."第一次排在前面,第二次排在末?

I notice the following sort outputs. Who understands why the '.' gets sorted in front the first time and at the end the second time?

我试图调试一个程序,该程序在一个大的已排序文件中查找行,但是罪魁祸首似乎是我对linux sort的期望/理解.

I was trying to debug a program which looks up lines in a large sorted file, but the culprit seems to be my expectation/understanding of linux sort.

$ sort --debug
sort: using ‘en_US.UTF-8’ sorting rules
/mnt/x/E
/mnt/x/.
<ctrl-D>
/mnt/x/.
________
/mnt/x/E
________
$ sort --debug
sort: using ‘en_US.UTF-8’ sorting rules
/mnt/x/Ed
/mnt/x/.T
<ctrl-D>
/mnt/x/Ed
_________
/mnt/x/.T
_________
$

推荐答案

不是那个."在其他字符之前或之后-是根本没有检查过;它纯粹是根据字母字符排序的.

It's not that "." comes before or after other characters - it's that it's not being examined at all; it's sorting purely based on the alphabetic characters.

在第一个示例中,<end-of-string>E之前排序;在第二个示例中,E排在T之前.

In your first example, <end-of-string> sorts before E; in the second example, E sorts before T.

此行为取决于排序规则的语言环境设置.您可以使用诸如LC_COLLATE:

This behaviour is dependent on the locale settings for collation. You can influence this with environment variables, such as LC_COLLATE:

$ env LC_COLLATE=C sort
/mnt/x/Ed
/mnt/x/.T
^D
/mnt/x/.T
/mnt/x/Ed
$ env LC_COLLATE=en_US.UTF-8 sort
/mnt/x/Ed
/mnt/x/.T
^D
/mnt/x/Ed
/mnt/x/.T
$

C语言环境下,将考虑所有ASCII字符,并按其ASCII顺序对其进行排序.在许多其他语言环境中,标点符号被忽略-这大概是导致您所看到的行为的原因.

Under the C locale, all ASCII characters are considered, and are sorted in their ASCII order; in many other locales punctuation is ignored - this is presumably what is causing the behaviour you're seeing.

您可以使用locale命令检查您的语言环境设置.

You can examine your locale settings using the locale command.

这篇关于我不理解的Linux排序顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆