基于一列Awk合并两个文件 [英] merge two files based on one column Awk

查看:671
本文介绍了基于一列Awk合并两个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试合并两个制表符分隔的文件-长度不相等. 我需要基于第1列合并文件,并从每个文件的第3列获取值到新文件.如果任何文件缺少任何ID(不常见的值),则新文件中的值应该为空-

I am trying to merge two tab delimited files files - which are of unequal lengths. I need to merge the files based on column number 1 and get the values from the 3rd column of each file to the new file. If any of the files is missing any id ( uncommon value) then it should get a blank value in the new file -

File1: 
id1 2199 082
id2 0909 20909
id3 8002 8030
id4 28080 80828

File2:

id1 988 00808
id2 808 80808
id4 8080 2525
id6 838 3800

Merged file :

id1 082 00808
id2 20909 80808
id3 8030  
id4 80828 2525
id6   3800

我浏览了许多论坛和帖子,到目前为止,我已经拥有了

I went through many forums and posts and so far I have this

awk -F\t 'NR==FNR{A[$1]=$1; B[$1]=$1; next} {$2=A[$1]; $3=B[$1]}1'

但是它不能产生正确的结果,任何人都可以建议.非常感谢!

but it does not yield the right result, can anyone please suggest. thanks a lot!

推荐答案

$ awk -F'\t' 'NR==FNR{A[$1]=$3; next} {A[$1]; B[$1]=$3} END{for (id in A) print id,A[id],B[id]}' OFS='\t' File1 File2 | sort
id1     082     00808
id2     20909   80808
id3     8030
id4     80828   2525
id6             3800

工作原理

此脚本使用两个变量.对于File1中的每一行,关联数组A都有一个与ID和第三个字段的值相对应的键.对于File2中的每个id,A还具有一个键(但不一定是值).对于File2,数组B的每个ID都有一个键,该键具有第三列中的相应值.

How it works

This script uses two variables. For every line in File1, associative array A has a key corresponding to the id and the value of the third field. For every id in File2, A also has a key (but not necessarily a value). For File2, array B has a key for every id with the corresponding value from the third column.

  • -F'\t'

这会将输入的字段分隔符设置到选项卡.请注意,必须用\t引起引用以保护它不受外壳影响.

This sets the field separator on input to a tab. Note that \t must be quoted to protect it from the shell.

NR==FNR{A[$1]=$3; next}

这将为第一个文件设置关联数组A.

This sets the associative array A for the first file.

A[$1]; B[$1]=$3

这将为第二个文件设置关联数组.它还可以确保数组A为file2中的每个id都有一个键.

This sets associative array for the second file. It also makes sure that the array A has a key for every id in file2.

END{for (id in A) print id,A[id],B[id]}

这将打印出结果.

OFS='\t'

这会将输出字段分隔符设置为选项卡.

This sets the output field separator to a tab.

sort

awk构造for key in array不能保证以任何特定顺序返回键.我们使用sort将输出按ID升序排序.

The awk construct for key in array is not guaranteed to return the keys in any particular order. We use sort to sort the output into ascending order in the id.

这篇关于基于一列Awk合并两个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆