删除unix中完全相同的重复列 [英] remove entirely same duplicate columns in unix
问题描述
假设我有一个如下文件:
Let say I have a file as below:
number 2 6 7 10 number 6 13
name1 A B C D name1 B E
name2 A B C D name2 B E
name3 B A D A name3 A F
name4 B A D A name4 A F
我希望删除完全相同的重复列,并且输出文件如下:
I wish to remove the entirely the same duplicate columns and the output file is as below:
number 2 6 7 10 13
name1 A B C D E
name2 A B C D E
name3 B A D A F
name4 B A D A F
我对行使用sort
和uniq
命令,但从不知道如何对列进行操作.有人可以建议一个好方法吗?
I use sort
and uniq
command for lines but never know how to do for columns. Can anyone suggest a good way?
推荐答案
这是使用awk保留订单的方法
Here is a way with awk that preserves the order
awk 'NR==1{for(i=1;i<=NF;i++)b[$i]++&&a[i]}{for(i in a)$i="";gsub(" +"," ")}1' file
输出
number 2 6 7 10 13
name1 A B C D E
name2 A B C D E
name3 B A D A F
name4 B A D A F
工作原理
NR==1
如果是第一条记录
for(i=1;i<=NF;i++)
在字段上循环,NF
是字段数
A loop over the fields, NF
is the number of fields
b[$i]++&&a[i]
如果$i
出现多次(字段i
中包含的数据),则使用i键将一个元素添加到数组a.
If there has been more than one occurrence of $i
(The data contained in field i
), then add an element to array a with the key of i.
此下一个块在所有记录(包括一条记录)上执行.
This next block is executed on all records(including record one).
{for(i in a)$i="";
对于集合中的每个键,对应的字段为空.
For every key in a set the corresponding field to nothing.
gsub(" +"," ")
删除多余的空格
1
始终求值为true,因此打印所有记录.
Always evaluates to true so print all records.
这篇关于删除unix中完全相同的重复列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!