匹配数据以获取awk中两个文件中的正确ID [英] Matching data to correct ID from two files in awk

查看:110
本文介绍了匹配数据以获取awk中两个文件中的正确ID的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试合并来自两个不同文件的数据.在每个文件中,一些数据链接到某个ID.我要合并"两个文件,因为必须将所有 ID都打印到一个新文件中,并且两个 文件中的数据都必须与该ID正确匹配.示例:

I am trying to combine data from two different files. In each file, some data is linked to some ID. I want to 'combine' both files in the sense that all ID's must be printed to a new file, and data from both files must be correctly matched to the ID. Example:

cat file_1
1.01    data_a
1.02    data_b
1.03    data_c
1.04    data_d
1.05    data_e
1.06    data_f

cat file_2
1.01    data_aa
1.03    data_cc
1.05    data_ee
1.09    data_ii

所需的结果是:

cat files_combined
1.01    data_a    data_aa
1.02    data_b
1.03    data_c    data_cc
1.04    data_d    
1.05    data_e    data_ee
1.06    data_f
1.09              data_ii

我知道如何遍历每个ID的漫长而缓慢的方式.某种伪代码示例:

I know how to do it the long, slow way through looping over each ID. Somewhat pseudocode example:

awk -F\\t '{print $1}' file_1 > files_combined
awk -F\\t '{print $1}' file_2 >> files_combined
sort -u -n files_combined > tmp && mv tmp files_combined

count=0
while read line; do
    count++
    ID=$line
    value1=$(grep "$ID" file_1 | awk -F\\t '{print $2}')
    value2=$(grep "$ID" file_2 | awk -F\\t '{print $2}')
    awk -F\\t 'NR=='$count' {$2='$value1' && $3='$value2'} 1' OFS="\t" files_combined > tmp && mv tmp files_combined
done < files_combined

这可以完成10行的文件的工作,但是对于100000行的文件来说,它花费的时间太长了.我只是在寻找毫无疑问的魔术awk解决方案.

This does the job for a file with 10 lines, but with 100000 lines it simply takes too long. I'm just looking for that magic awk solution that is there without a doubt.

bob dylan提供的解决方案:

Solution provided by bob dylan:

join -j -a 1 -a 2 -t $'\t' -o auto file_1 file_2

推荐答案

是否必须是awk,还是选择它是因为您认为这是最好的-最简单的方法?

Does it have to be awk, or did you choose this because you think that's the best - easiest way?

您可以通过加入来实现

$join -j 1 -a 1 -a 2 -o auto file_1 file_2 | column -t -s' ' -o' '
1.01 data_a data_aa
1.02 data_b
1.03 data_c data_cc
1.04 data_d
1.05 data_e data_ee
1.06 data_f
1.09        data_ii

根据KamilCuk的出色建议,您可以在以后保存输出.

edit: As per the excellent suggestion from KamilCuk you can preserve the output afterwards.

这篇关于匹配数据以获取awk中两个文件中的正确ID的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆