在管道分隔的字段上排序不符合预期 [英] sort on pipe-delimited fields not behaving as expected

查看:70
本文介绍了在管道分隔的字段上排序不符合预期的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑这个很小的文本文件:

Consider this tiny text file:

ab
a

如果通过sort(1)运行它,则会得到

If we run it through sort(1), we get

a
ab

因为当然aab之前.

但是现在考虑这个文件:

But now consider this file:

ab|c
a|c

如果通过sort -t'|'运行它,我们再次希望aab之前排序,但事实并非如此! (请在您的Unix版本下试用并查看.)

If we run it through sort -t'|', we again expect a to sort before ab, but it does not! (Try it under your version of Unix and see.)

我认为这里发生的是sort-t选项不是真正分隔字段-可能正在改变字段2的开始方式(例如)找到了,但这并没有改变字段1 结束的方式. a|cab|c之后排序,因为'|'在ASCII中在'b'之后. (就像-t'|'参数被忽略一样,因为没有它,您将得到相同的结果.)

What I think is happening here is that the -t option to sort is not really delimiting fields -- it may be changing the way (say) the start of field 2 would be found, but it's not changing the way field 1 ends. a|c sorts after ab|c because '|' comes after 'b' in ASCII. (It's as if the -t'|' argument is ignored, because you get the same result without it.)

那么这是sort中的错误还是我的理解?有没有办法对第一个以竖线分隔的字段进行正确排序?

So is this a bug in sort or in my understanding of it? And is there a way to sort on the first pipe-delimited field properly?

这个问题是我试图回答另一个SO问题加入声明而忽略条目的

This question came up in my attempt to answer another SO question, Join Statement omitting entries .

推荐答案

sort的默认行为是将字段1到行尾的所有内容都视为排序键.如果要首先对字段1进行排序,然后对字段2进行排序,则需要明确指定.

sort's default behavior is to treat everything from field 1 to the end of the line as the sort key. If you want it to sort on field 1 first, then field 2, you need to specify that explicitly.

$ sort -k1,1 -k2,2 -t'|' <<< $'ab|c\na|c'
a|c
ab|c

这篇关于在管道分隔的字段上排序不符合预期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆