打印包含特定列中的值的行,该列由另一个列中的多个实体共享 [英] Print lines that contain a value in a specific column shared by more than 1 entity in another col
问题描述
我只想提取第2列中至少由第2列中至少2个唯一值共享的那些值.
I want to extract only those values in Column 2 that are shared by at least 2 unique values in Column 2.
使用相同的输入(在本例中为3个制表符分隔的列):
Using the same input (in this case 3- tab-separated columns):
waterline-n below-sheath-v 14.8097
dock-n below-sheath-v 14.5095
waterline-n below-steel-n 11.0330
picnic-n below-steel-n 12.2277
wavefront-n at-part-of-variance-n 18.4888
wavefront-n between-part-of-variance-n 17.0656
audience-b between-part-of-variance-n 17.6346
game-n between-part-of-variance-n 14.9652
whereabouts-n become-rediscovery-n 11.3556
whereabouts-n get-tee-n 10.9091
对于以下所需输出:
waterline-n below-sheath-v 14.8097
dock-n below-sheath-v 14.5095
waterline-n below-steel-n 11.0330
picnic-n below-steel-n 12.2277
wavefront-n between-part-of-variance-n 17.0656
audience-b between-part-of-variance-n 17.6346
game-n between-part-of-variance-n 14.9652
是否可以使用grep做到这一点?
Is it possible to do this using grep?
推荐答案
使用awk
并使用数组两次读取文件.
我认为仅使用grep
很难做到这一点.
Reading the file twice with awk
and using array.
I think this would be hard to do with grep
only.
awk 'FNR==NR {a[$2]++;next} a[$2]>1' file file
waterline-n below-sheath-v 14.8097
dock-n below-sheath-v 14.5095
waterline-n below-steel-n 11.0330
picnic-n below-steel-n 12.2277
wavefront-n between-part-of-variance-n 17.0656
audience-b between-part-of-variance-n 17.6346
game-n between-part-of-variance-n 14.9652
在第一遍FNR==NR
中,它会将数组中第2列的所有值相加,并为通过的每个匹配增加它.
在第二遍中,它在数组中查找并查看点击数是否超过一,如果可以,请打印该行.
In first pass FNR==NR
it adds all the value of column 2 in an array, and increment it for every hits that passes.
In pass two it looks in the array and see if hits is more than one and if ok, print the line.
这篇关于打印包含特定列中的值的行,该列由另一个列中的多个实体共享的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!