排序的输出似乎未排序 [英] Output from sort does not appear to be sorted

查看:51
本文介绍了排序的输出似乎未排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下文本文件(sort_test.txt):

I have the following text file (sort_test.txt):

PGA_scaffold1__77
PGA_scaffold2__36
PGA_scaffold3__111
PGA_scaffold4__129
PGA_scaffold5__109
PGA_scaffold6__104
PGA_scaffold7__69
PGA_scaffold8__63
PGA_scaffold9__45
PGA_scaffold10__49
PGA_scaffold11__79
PGA_scaffold12__71
PGA_scaffold13__52
PGA_scaffold14__91
PGA_scaffold15__101
PGA_scaffold16__33
PGA_scaffold17__51
PGA_scaffold18__69

当我尝试使用以下代码对文件进行排序时,排序输出似乎是乱序的(特别是第9行和第10行):

When I try to sort the file with the following code, the sort output seems to be out of order (specifically, lines 9 and 10):

IN: awk -F"_"'{print $ 1"_" $ 2"_" $ 3"_" $ 4}"sort_test.txt |排序

OUT:

PGA_scaffold10__49
PGA_scaffold11__79
PGA_scaffold12__71
PGA_scaffold13__52
PGA_scaffold14__91
PGA_scaffold15__101
PGA_scaffold16__33
PGA_scaffold17__51
PGA_scaffold1__77
PGA_scaffold18__69
PGA_scaffold2__36
PGA_scaffold3__111
PGA_scaffold4__129
PGA_scaffold5__109
PGA_scaffold6__104
PGA_scaffold7__69
PGA_scaffold8__63
PGA_scaffold9__45

为什么第9行和第10行似乎混乱?

Why do lines 9 and 10 seem to be out of order?

所需的输出:

PGA_scaffold10__49
PGA_scaffold11__79
PGA_scaffold12__71
PGA_scaffold13__52
PGA_scaffold14__91
PGA_scaffold15__101
PGA_scaffold16__33
PGA_scaffold17__51
PGA_scaffold18__69
PGA_scaffold1__77
PGA_scaffold2__36
PGA_scaffold3__111
PGA_scaffold4__129
PGA_scaffold5__109
PGA_scaffold6__104
PGA_scaffold7__69
PGA_scaffold8__63
PGA_scaffold9__45

如果我修改代码以仅打印前三个字段,则排序将达到我的期望:

If I modify the code to only print the first three fields, the sorting does what I expect:

IN: awk -F"_"'{print $ 1"__" $ 2"_" $ 3}"sort_test.txt |排序

OUT:

PGA_scaffold1_
PGA_scaffold10_
PGA_scaffold11_
PGA_scaffold12_
PGA_scaffold13_
PGA_scaffold14_
PGA_scaffold15_
PGA_scaffold16_
PGA_scaffold17_
PGA_scaffold18_
PGA_scaffold2_
PGA_scaffold3_
PGA_scaffold4_
PGA_scaffold5_
PGA_scaffold6_
PGA_scaffold7_
PGA_scaffold8_
PGA_scaffold9_

因此,似乎第四个字段有一些影响排序的内容,但原因尚不清楚.

So, it appears that there's something about the fourth field that impacts the sorting, but it's not clear why.

问题是,我需要进行初始排序,但是交换了第9行和第10行.

The problem is, I need the initial sorting, but with lines 9 and 10 swapped.

有人对为什么这样进行排序以及如何修改它以产生预期的输出有任何想法吗?

Does anyone have any thoughts on why the sorting is happening like this and how I can modify it so that produces the expected output?

推荐答案

非常感谢@Barmar将我指向 Unix&Linux 论坛!

Thanks so much to @Barmar for pointing me to the Unix & Linux forum!

我设法在这篇文章中间接找到我的问题的答案:

I managed to indirectly find the answer to my problem in this post:

gnu coreutils是否已损坏?

解决方案是更改我的语言环境!

The solution was to change my locale!

我的语言环境如下:

$ locale
LANG=en_US.UTF-8
LANGUAGE=en_US
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

运行后:

$ export LC_COLLATE = C

我能够获得所需的排序输出.

I was able to get my desired sorting output.

这篇关于排序的输出似乎未排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆