排序并保留具有最高价值的唯一重复项 [英] Sort and keep a unique duplicate which has the highest value
问题描述
我有一个类似于以下所示的文件,我想保留第一字段和第二字段之间的组合,该组合在第三字段上具有最高的值(带有箭头,箭头的不包含在实际文件中) .
I have a file like the one shown below, I want to keep the combinations between the first and second field which has the highest value on the third field(the ones with the arrows, arrows are not included in the actual file) .
1 1 10
1 1 12 <-
1 2 6 <-
1 3 4 <-
2 4 32
2 4 37
2 4 39
2 4 40 <-
2 45 12
2 45 15 <-
3 3 12
3 3 15
3 3 17
3 3 19 <-
3 15 4
3 15 9 <-
4 17 25
4 17 28
4 17 32
4 17 36 <-
4 18 4 <-
为了具有这样的输出:
1 1 12
1 2 6
1 3 4
2 4 40
2 45 15
3 3 19
3 15 9
4 17 36
4 18 4
我以为也许我只是在玩sort
和uniq
命令,但是我弄得一团糟.
And I thought maybe I just play with the sort
and uniq
command, but I made a mess.
有什么想法吗?
非常重要的一点:条目从一开始就没有整齐地排序,我只是用sort -k1,1 -k2,2 -k3,3
Very important note: the entries are not neatly sorted from the beginning, I just used sort -k1,1 -k2,2 -k3,3
先谢谢大家
推荐答案
这有点有趣,但是:
sort -nr myfile.txt | rev | uniq -f1 | rev | sort -n
输出:
1 1 12
1 2 6
1 3 4
2 4 40
2 45 15
3 15 9
3 3 19
4 17 36
4 18 4
工作原理:
- 按数字顺序反向排列,将最高值放在顶部(以便将其保存)
- 反转每一行,因此最后一个字段是第一个字段(
uniq
所需) - 仅保存第一行uniq,但忽略第一个字段(是最后一个字段)
- 将行恢复为原始顺序
- 再次将线从低到高排序
- Sort reverse numerically, putting the highest values at the top (so they are saved)
- Reverse each line, so the last field is first (needed for
uniq
) - Save only the first uniq line, but ignoring the first field (was the last field)
- Reverse the line back to original order
- Sort the lines from low to high again
可能不是世界上最高效的,但至少每个步骤都有意义.
Probably not the most efficient in the world, but at least each step makes some sense.
这篇关于排序并保留具有最高价值的唯一重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!