如果发现重复行,则打印整行 [英] Print whole lines, when find duplicate
问题描述
这是我输入的片段:
DGD3 SOL10
DGD53 SOL15
DGD100 SOL15
DGD92 SOL20
DGD41 SOL22
DGD62 SOL35
DGD13 SOL40
DGD13 SOL40
我的预期输出
DGD53 SOL15
DGD100 SOL15
DGD13 SOL40
DGD13 SOL40
在我的数据中,有时我会有SOL重复项(不超过两次重复,例如文件中某些SOL的三倍,而仅仅是重复项).SOL在我的第二列(2美元)中.因此,当我发现重复的SOL($ 2)时,我需要一个可以打印整行(DGD和SOL)的程序.你能帮我吗?
In my data I have sometimes SOL duplicates (not more than two repetitions not for example three times some SOL in a file but only duplicates). SOL is in my second column ($2). So I need a program which print whole line (DGD and SOL) when I find duplicate SOL ($2). Could you help me?
推荐答案
另一个awk.如果第二个字段的实例超过2个,则单次运行无需对文件进行排序,即可正常运行.在最坏的情况下,它会将完整的文件散列到内存中,并且不产生任何输出:
Another awk. Single run, no need for the file to be sorted, works correctly if there are more than 2 instances of the second field. In worst case it hashes the complete file in memory and produces no output:
$ awk '{
if(!c[$2]++) # if first instance of $2
a[$2]=$0 # store it
else {
if(c[$2]==2) { # if second instance
print a[$2] # print previous
delete a[$2] # no need to waste my memory any more
}
print # after first instance of $2 we always print current
}
}' file
输出:
DGD53 SOL15
DGD100 SOL15
DGD13 SOL40
DGD13 SOL40
这篇关于如果发现重复行,则打印整行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!