在管道分隔的字段上排序不符合预期 [英] sort on pipe-delimited fields not behaving as expected
问题描述
考虑这个很小的文本文件:
Consider this tiny text file:
ab
a
如果通过sort(1)运行它,则会得到
If we run it through sort(1), we get
a
ab
因为当然a
在ab
之前.
但是现在考虑这个文件:
But now consider this file:
ab|c
a|c
如果通过sort -t'|'
运行它,我们再次希望a
在ab
之前排序,但事实并非如此! (请在您的Unix版本下试用并查看.)
If we run it through sort -t'|'
, we again expect a
to sort before ab
, but it does not! (Try it under your version of Unix and see.)
我认为这里发生的是sort
的-t
选项不是真正分隔字段-可能正在改变字段2的开始方式(例如)找到了,但这并没有改变字段1 结束的方式. a|c
在ab|c
之后排序,因为'|'
在ASCII中在'b'
之后. (就像-t'|'
参数被忽略一样,因为没有它,您将得到相同的结果.)
What I think is happening here is that the -t
option to sort
is not really delimiting fields -- it may be changing the way (say) the start of field 2 would be found, but it's not changing the way field 1 ends. a|c
sorts after ab|c
because '|'
comes after 'b'
in ASCII. (It's as if the -t'|'
argument is ignored, because you get the same result without it.)
那么这是sort
中的错误还是我的理解?有没有办法对第一个以竖线分隔的字段进行正确排序?
So is this a bug in sort
or in my understanding of it? And is there a way to sort on the first pipe-delimited field properly?
这个问题是我试图回答另一个SO问题加入声明而忽略条目的
This question came up in my attempt to answer another SO question, Join Statement omitting entries .
推荐答案
sort
的默认行为是将字段1到行尾的所有内容都视为排序键.如果要首先对字段1进行排序,然后对字段2进行排序,则需要明确指定.
sort
's default behavior is to treat everything from field 1 to the end of the line as the sort key. If you want it to sort on field 1 first, then field 2, you need to specify that explicitly.
$ sort -k1,1 -k2,2 -t'|' <<< $'ab|c\na|c'
a|c
ab|c
这篇关于在管道分隔的字段上排序不符合预期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!