AWK比较两个文件和合并输出 [英] awk compare two files and merge output

查看:335
本文介绍了AWK比较两个文件和合并输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

原题

我有2个文件 1.csv 2.csv

1.csv: -

  AK,BA,阿尔法,1095
ALL,SA,阿尔法,9592

2.csv: -

  AK,BA,垃圾邮件,10

我要合并文件,以便它将打印输出文件,如下

OUTPUT: -

  AK,BA,阿尔法,1095,垃圾邮件,10
AL,SA,阿尔法,9592,NA,NA


更新问题

我有2个文件 alpha1.csv SPAM1.csv

  $猫alpha1.csv
AKTEL_BANGLADESH,孟加拉国,Alphanumeric_A_MSISDN_blocking,1095
ALJAWAL_SAUDI_TELECOM_COMPANY,SAUDI_ARABIA,Al​​phanumeric_A_MSISDN_blocking,9592
B-MOBILE_BRUNEI,文莱,Alphanumeric_A_MSISDN_blocking,3
$猫SPAM1.csv
AIN_AIS_GLOBAL_COMMUNICATIONS,泰国,SPAM_CHAIN​​_SMS_REJECT(Spam_Detection_and_Blocking),1
AKTEL_BANGLADESH,孟加拉国,SPAM_CHAIN​​_SMS_REJECT(Spam_Detection_and_Blocking),16
ALJAWAL_SAUDI_TELECOM_COMPANY,SAUDI_ARABIA,SPAM_CHAIN​​_SMS_REJECT(Spam_Detection_and_Blocking),10593
AT&安培; T_WIRELESS,美国,SPAM_CHAIN​​_SMS_REJECT(Spam_Detection_and_Blocking),218
BANGLALINK_SHEBA_BANGLADESH,孟加拉国,SPAM_CHAIN​​_SMS_REJECT(Spam_Detection_and_Blocking),111

期望的输出:

<$p$p><$c$c>AIN_AIS_GLOBAL_COMMUNICATIONS,THAILAND,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),1,**NA,NA**
AKTEL_BANGLADESH,BANGLADESH,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),16,Alphanumeric_A_MSISDN_blocking,1095
ALJAWAL_SAUDI_TELECOM_COMPANY,SAUDI_ARABIA,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),10593,Alphanumeric_A_MSISDN_blocking,9592
AT&安培; T_WIRELESS,美国,SPAM_CHAIN​​_SMS_REJECT(Spam_Detection_and_Blocking),218 ** NA,NA **
BANGLALINK_SHEBA_BANGLADESH,BANGLADESH,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),111,**NA,NA**
B-MOBILE_BRUNEI,文莱,** NA,NA **,Alphanumeric_A_MSISDN_blocking,3

我的命令只打印匹配的文件中的两个的情况下使用文件1和不打印非匹配情况:

  $的awk'BEGIN {FS = OFS =,} == FNR {NR一个[$ 1,$ 2] = $ 3 FS $ 4;接下来} {打印$ 0(I = A [$ 1,$ 2] [$ 1,$ 2]:NA,NA)}'alpha1.csv SPAM1.csv
AIN_AIS_GLOBAL_COMMUNICATIONS,泰国,SPAM_CHAIN​​_SMS_REJECT(Spam_Detection_and_Blocking),1,NA,NA
AKTEL_BANGLADESH,BANGLADESH,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),16,Alphanumeric_A_MSISDN_blocking,1095
ALJAWAL_SAUDI_TELECOM_COMPANY,SAUDI_ARABIA,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),10593,Alphanumeric_A_MSISDN_blocking,9592
AT&安培; T_WIRELESS,美国,SPAM_CHAIN​​_SMS_REJECT(Spam_Detection_and_Blocking),218,NA,NA
BANGLALINK_SHEBA_BANGLADESH,孟加拉国,SPAM_CHAIN​​_SMS_REJECT(Spam_Detection_and_Blocking),111,NA,NA


解决方案

您可以使用此功能,例如:

  $的awk'BEGIN {FS = OFS =,} == FNR {NR一个[$ 1,$ 2] = $ 3 FS $ 4;接下来} {打印$ 0,(在($ 1,$ 2)[$ 1,$ 2]:NA,NA)}'F2 F1
AK,BA,阿尔法,1095,垃圾邮件,10
ALL,SA,阿尔法,9592,NA,NA

说明


  • BEGIN {FS = OFS =} 设置输入和输出的字段分隔符为逗号。

  • FNR == {NR一个[$ 1,$ 2] = $ 3 FS $ 4;接下来} 商店第三和第四值阵列中的 A [] ,其索引是元组($ 1,$ 2)

  • {打印$ 0,(在($ 1,$ 2)[$ 1,$ 2]:NA,NA)} 通过匹配在一起打印线从数组项。如果没有这样的元素,然后打印 NA,NA

Original question

I have 2 files 1.csv and 2.csv

1.csv:-

AK,BA,Alpha,1095  
ALL,SA,Alpha,9592  

2.csv:-

AK,BA,SPAM,10  

I want to merge files so that it will print output file as below

OUTPUT:-

AK,BA,Alpha,1095,SPAM,10  
AL,SA,Alpha,9592,NA,NA  


Updated question

I have 2 files alpha1.csv and SPAM1.csv

$ cat alpha1.csv  
AKTEL_BANGLADESH,BANGLADESH,Alphanumeric_A_MSISDN_blocking,1095  
ALJAWAL_SAUDI_TELECOM_COMPANY,SAUDI_ARABIA,Alphanumeric_A_MSISDN_blocking,9592  
B-MOBILE_BRUNEI,BRUNEI,Alphanumeric_A_MSISDN_blocking,3  


$ cat SPAM1.csv  
AIN_AIS_GLOBAL_COMMUNICATIONS,THAILAND,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),1  
AKTEL_BANGLADESH,BANGLADESH,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),16  
ALJAWAL_SAUDI_TELECOM_COMPANY,SAUDI_ARABIA,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),10593  
AT&T_WIRELESS,UNITED_STATES,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),218  
BANGLALINK_SHEBA_BANGLADESH,BANGLADESH,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),111  

expected output:

AIN_AIS_GLOBAL_COMMUNICATIONS,THAILAND,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),1,**NA,NA**  
AKTEL_BANGLADESH,BANGLADESH,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),16,Alphanumeric_A_MSISDN_blocking,1095  
ALJAWAL_SAUDI_TELECOM_COMPANY,SAUDI_ARABIA,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),10593,Alphanumeric_A_MSISDN_blocking,9592  
AT&T_WIRELESS,UNITED_STATES,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),218,**NA,NA**  
BANGLALINK_SHEBA_BANGLADESH,BANGLADESH,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),111,**NA,NA**  
B-MOBILE_BRUNEI,BRUNEI,**NA,NA**,Alphanumeric_A_MSISDN_blocking,3  

My command is only printing matched cases of file two with file 1 and not printing non matched cases:

$ awk 'BEGIN{FS=OFS=","} FNR==NR {a[$1,$2]=$3 FS $4; next} {print $0, (i=a[$1,$2]?a[$1,$2]:"NA,NA")}' alpha1.csv SPAM1.csv  
AIN_AIS_GLOBAL_COMMUNICATIONS,THAILAND,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),1,NA,NA  
AKTEL_BANGLADESH,BANGLADESH,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),16,Alphanumeric_A_MSISDN_blocking,1095  
ALJAWAL_SAUDI_TELECOM_COMPANY,SAUDI_ARABIA,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),10593,Alphanumeric_A_MSISDN_blocking,9592  
AT&T_WIRELESS,UNITED_STATES,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),218,NA,NA  
BANGLALINK_SHEBA_BANGLADESH,BANGLADESH,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),111,NA,NA  

解决方案

You can use this, for example:

$ awk 'BEGIN{FS=OFS=","} FNR==NR {a[$1,$2]=$3 FS $4; next} {print $0, (($1,$2) in a?a[$1,$2]:"NA,NA")}' f2 f1
AK,BA,Alpha,1095,SPAM,10
ALL,SA,Alpha,9592,NA,NA

Explanation

  • BEGIN{FS=OFS=","} set input and output field separator as comma.
  • FNR==NR {a[$1,$2]=$3 FS $4; next} store 3rd and 4th values in an array a[], whose index is the tuple ($1,$2).
  • {print $0, (($1,$2) in a?a[$1,$2]:"NA,NA")} print the line together with the matched item from the array. If there is no such element, then print NA,NA.

这篇关于AWK比较两个文件和合并输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆