基于一列Awk合并两个文件 [英] merge two files based on one column Awk
问题描述
我正在尝试合并两个制表符分隔的文件-长度不相等. 我需要基于第1列合并文件,并从每个文件的第3列获取值到新文件.如果任何文件缺少任何ID(不常见的值),则新文件中的值应该为空-
I am trying to merge two tab delimited files files - which are of unequal lengths. I need to merge the files based on column number 1 and get the values from the 3rd column of each file to the new file. If any of the files is missing any id ( uncommon value) then it should get a blank value in the new file -
File1:
id1 2199 082
id2 0909 20909
id3 8002 8030
id4 28080 80828
File2:
id1 988 00808
id2 808 80808
id4 8080 2525
id6 838 3800
Merged file :
id1 082 00808
id2 20909 80808
id3 8030
id4 80828 2525
id6 3800
我浏览了许多论坛和帖子,到目前为止,我已经拥有了
I went through many forums and posts and so far I have this
awk -F\t 'NR==FNR{A[$1]=$1; B[$1]=$1; next} {$2=A[$1]; $3=B[$1]}1'
但是它不能产生正确的结果,任何人都可以建议.非常感谢!
but it does not yield the right result, can anyone please suggest. thanks a lot!
推荐答案
$ awk -F'\t' 'NR==FNR{A[$1]=$3; next} {A[$1]; B[$1]=$3} END{for (id in A) print id,A[id],B[id]}' OFS='\t' File1 File2 | sort
id1 082 00808
id2 20909 80808
id3 8030
id4 80828 2525
id6 3800
工作原理
此脚本使用两个变量.对于File1中的每一行,关联数组A
都有一个与ID和第三个字段的值相对应的键.对于File2中的每个id,A
还具有一个键(但不一定是值).对于File2,数组B
的每个ID都有一个键,该键具有第三列中的相应值.
How it works
This script uses two variables. For every line in File1, associative array A
has a key corresponding to the id and the value of the third field. For every id in File2, A
also has a key (but not necessarily a value). For File2, array B
has a key for every id with the corresponding value from the third column.
-
-F'\t'
这会将输入的字段分隔符设置到选项卡.请注意,必须用\t
引起引用以保护它不受外壳影响.
This sets the field separator on input to a tab. Note that \t
must be quoted to protect it from the shell.
NR==FNR{A[$1]=$3; next}
这将为第一个文件设置关联数组A
.
This sets the associative array A
for the first file.
A[$1]; B[$1]=$3
这将为第二个文件设置关联数组.它还可以确保数组A
为file2中的每个id都有一个键.
This sets associative array for the second file. It also makes sure that the array A
has a key for every id in file2.
END{for (id in A) print id,A[id],B[id]}
这将打印出结果.
OFS='\t'
这会将输出字段分隔符设置为选项卡.
This sets the output field separator to a tab.
sort
awk构造for key in array
不能保证以任何特定顺序返回键.我们使用sort
将输出按ID升序排序.
The awk construct for key in array
is not guaranteed to return the keys in any particular order. We use sort
to sort the output into ascending order in the id.
这篇关于基于一列Awk合并两个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!