使用awk,删除在不同索引中具有重复的成对列的行 [英] Using awk, remove lines with duplicate pair of columns in different indexes
问题描述
给出以下两行:
foo1 foo2 foo3 foo4
foo3 foo4 foo1 foo2
foo1 foo2 foo3 foo4
foo3 foo4 foo1 foo2
第2行是重复的,因为它的第1列和第2列对等于第1行中的第3列和第4列.
Line 2 is a duplicate as its pair of columns 1 and 2 are equal to columns 3 and 4 in line 1.
使用awk删除第二行的最短方法是什么?
What's the shortest way to remove the second line using awk?
推荐答案
这似乎可行,但请自行检查:
This seems to work , but make a check on your own:
cat <<EOF >file1
foo1 foo2 foo3 foo4
foo3 foo4 foo1 foo2
foo2 foo1 foo3 foo4
fooA fooB fooC fooD
fooC fooD fooA fooB
fooD fooC fooA fooB
fooD fooB fooC fooA
EOF
awk '!f1[$1$2$3$4]++ && !f1[$3$4$1$2]++' file1
#Output
foo1 foo2 foo3 foo4
foo2 foo1 foo3 foo4
fooA fooB fooC fooD
fooD fooC fooA fooB
fooD fooB fooC fooA
正如评论所指出的那样,为避免可能的字段不必要的连接以及避免foob ar
和foo bar
字段之间的混淆,最好使用字段分隔符FS(无论该FS设置了什么值-默认情况下为空格)作为数组的一部分表示:
As pointed out on comments, to avoid possibly unwanted concatenating of the fields and avoid confusion between foob ar
and foo bar
fields, is better to use the field separator FS (in whatever value this FS has been set - space by default) as part of the array indeces :
awk '!f1[$1FS$2FS$3FS$4]++ && !f1[$3FS$4FS$1FS$2]++' file1
这篇关于使用awk,删除在不同索引中具有重复的成对列的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!