使用AWK合并基于多个列的两个文件 [英] Using AWK to merge two files based on multiple columns

查看:546
本文介绍了使用AWK合并基于多个列的两个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个CSV文件,以; (分号)
作为分隔符,我
需要基于
上的三个列合并,每个文件使用AWK。键列
不连续。想法是从文件B中获得
两列,并在文件A中的所有其他列之后打印它们

I have two CSV files, with ; (semicolon) as a separator, that I need to merge based on three columns on each file using AWK. The key columns are not consecutive. Idea is to get two columns from file B and print them after all the other columns from file A.

文件A(键位于A1,A3和A5):

File A (keys are in A1, A3, and A5):

A1;A2;A3;A4;A5
K1;D1;K2;D2;K3
K4;D3;K5;D4;K6
K7;D5;K8;D6;K9
K1;D7;K2;D8;K3

文件B(B1,B2,B4中的键):

File B (keys in B1, B2, B4):

B1;B2;B3;B4;B5
K1;K2;D9;K3;D0
K4;K5;DA;K6;DB
KA;KB;DC;KC;DD

会产生:

A1;A2;A3;A4;A5;;
K1;D1;K2;D2;K3;D9;D0
K4;D3;K5;D4;K6;DA;DB
K7;D5;K8;D6;K9;;
K1;D7;K2;D8;K3;D9;D0

我发现了几个SO中的示例(例如如何使用awk 如何使用AWK合并两个文件?)和其他地方,但我无法将它们转换为我的需求,因为对它们的记录还不够好,以至于像我这样的AWK n00b都会真正理解它们工作。

I have found several examples here in SO (for example How to merge two files based on the first three columns using awk and How to merge two files using AWK?) and elsewhere but I haven't been able to convert them to my needs, as they haven't been documented so well that an AWK n00b like myself would really understand how they work.

我最近得到的是:

awk -F \; -v OFS=\; 'FNR==NR{c[$1]=$3 FS $5;next}{ print $0, c[$1]}' B A

但是它仍然从输出行1和4中省略了一个分号或一列:

But it still leaves out one semicolon--or a column--from output lines 1 and 4:

A1;A2;A3;A4;A5;
K1;D1;K2;D2;K3;D9;D0
K4;D3;K5;D4;K6;DA;DB
K7;D5;K8;D6;K9;
K1;D7;K2;D8;K3;D9;D0

我该怎么办说明我要用于比较的列?显然,现在只使用第一列进行比较。

An how do I state which columns I want to use for comparing? Apparently now it's only using first column for comparing.

推荐答案

这将打印出来,而没有多余的; 在不匹配的行上。您必须先提供B文件。

This will print without the extra ; on unmatched lines. You have to provide B file first.

 awk 'BEGIN {
          OFS=FS=";"
      } 

      FNR==NR {
          key[$1 FS $2 FS $4]=$3 OFS $5
      } 

      FNR!=NR {
          c=$1 FS $3 FS $5; 
          if(c in key) 
               print $0,key[c]; 
          else 
               print
      }'  fileB fileA

多余的分隔符,将最后的 print 更改为 print $ 0 OFS OFS

if you need the extra delimiters, change the last print to print $0 OFS OFS

这篇关于使用AWK合并基于多个列的两个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆