如何使用awk根据前两个字段找出第三字段的最大值 [英] How to find out the max value of the third field according to the first two fields using awk
问题描述
文件内容如下:
333379266 834640619 88
333379280 834640621 99
333379280 834640621 66
333376672 857526666 99
333376672 857526666 78
333376672 857526666 62
前两列可能重复,我想输出前两列以及第三列的相应最大值.在这种情况下,结果文件应如下所示:
The first two columns may be duplicate, and I want to output the first two columns and the corresponding max value of the third column.In this case,The result file should be as follows:
333379266 834640619 88
333379280 834640621 99
333376672 857526666 99
我的尝试是:
awk '{d[$1" "$2]=$3;if ($3>=d[$1" "$2]){num[$1" "$2]=$3} else{num[$1" "$2]=d[$1" "$2]} }END{for(i in num) print i,num[i]}'
但是它不起作用,因为$3>=d[$1" "$2]
总是正确的,num的值始终是$3
,并且awk
逐行读取文件,因此num
的值始终是最后一个,而不是最大的.
But it does not work,because $3>=d[$1" "$2]
is always right , the value of num is always $3
, and awk
reads the file line by line,so the value of num
is always the last one,not the max one.
如果有人可以给我解决方案,我将不胜感激.
I'll be appreciated if anyone can give me the solution.Thanks in advance.
推荐答案
能否请您尝试以下操作.
Could you please try following.
awk '
{
array[$1,$2]=array[$1,$2]>$3?array[$1,$2]:$3
}
END{
for(i in array){
print i,array[i]
}
}
' Input_file
带有OP的代码的问题:
Issues with OP's code:
在线d[$1" "$2]=$3;if ($3>=d[$1" "$2])
上;因为您要在与当前行的第3个字段进行比较之前分配数组d的值,所以您在OP尝试中遇到的主要问题始终是正确的.
On your line d[$1" "$2]=$3;if ($3>=d[$1" "$2])
; since you are assigning array d's value before comparison to current line's 3rd field so your this condition is always going to be true is what I could see major issue in OP's attempt.
OP的尝试修复: 恕我直言,我上面的解决方案应该不错,但尝试在此处修复OP的尝试.
OP's attempt fix: IMHO my solution above should be good but trying to fix OP's attempt here.
awk '{if ($3>=d[$1" "$2]){num[$1" "$2]=$3} else{num[$1" "$2]=d[$1" "$2]};d[$1" "$2]=$3}END{for(i in num) print i,num[i]}' Input_file
这篇关于如何使用awk根据前两个字段找出第三字段的最大值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!