Linux的AWK比较两个CSV文件，并用标志创建一个新文件 [英] linux awk comparing two csv files and creating a new file with a flag

查看：623 发布时间：2016/7/28 16:38:02 linux bash csv awk export-to-csv

本文介绍了Linux的AWK比较两个CSV文件，并用标志创建一个新文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有我需要比较并获得差到新格式的文件2的CSV文件。样品如下。

旧文件

  DTL，11111111,1111111111111111,11111111111，Y，N，XX，XX
DTL，22222222,2222222222222222,22222222222，Y，Y，CC，CC
DTL，33333333,3333333333333333,33333333333，Y，Y，DD，DD
DTL，44444444,4444444444444444,44444444444，Y，Y，SS，SS
DTL，55555555,5555555555555555,55555555555，Y，Y，QQ，QQ

新文件

  DTL，11111111,1111111111111111,11111111111，Y，Y，XX，XX
DTL，22222222,2222222222222222,22222222222，Y，N，CC，CC
DTL，44444444,4444444444444444,44444444444，Y，Y，SS，SS
DTL，55555555,5555555555555555,55555555555，Y，Y，QQ，QQ
DTL，77777777,7777777777777777,77777777777，N，N，EE，EE

输出文件

我想比较新旧CSV文件，并发现，在新的文件已经实行的变化和更新标志来表示这些变化

ü - 如果新文件记录被更新
ð - 如果存在于旧文件中的记录在新文件中被删除
N - 如果存在的话，在新文件中的记录是不可用的旧文件

样本输出文件是这样的。

  DTL，11111111,1111111111111111,11111111111，Y，Y，XX，XXū
DTL，22222222,2222222222222222,22222222222，Y，N，CC，CCū
DTL，33333333,3333333333333333,33333333333，Y，Y，DD，DDð
DTL，77777777,7777777777777777,77777777777，N，N，EE，EEñ

我用diff命令，但它会重复更新的记录过这不是我想要的。

  DTL，11111111,1111111111111111,11111111111，Y，N，XX，XX
 DTL，22222222,2222222222222222,22222222222，Y，Y，CC，CC
 DTL，33333333,3333333333333333,33333333333，Y，Y，DD，DD
  ---
 DTL，11111111,1111111111111111,11111111111，Y，Y，XX，XX
 DTL，22222222,2222222222222222,22222222222，Y，N，CC，CC
 5A5
 DTL，77777777,7777777777777777,77777777777，N，N，EE，EE

我用了一个AWK单行命令来过滤掉我的记录，以及

 的awk'NR == FNR {a [$ 1];}旁（在A $ 1）！FS =：old.csv new.csv

这个是问题是犯规让我只属于旧文件中的记录。
这是

  DTL，33333333,3333333333333333,33333333333，Y，Y，DD，DD

我发起了一个驱动bash脚本，以及要ahieve这一点，但没有找到一个很好的例子太多的帮助。

  myscript.awk开始 {
        FS =，＃输入字段分隔符
        OFS =，＃输出字段分隔符
}NR＆GT; 1 {
    ＃旗
    ＃N  - 新的记录已删除D-ü - 更新ID = $ 1
    名称= $ 16
    标志='N'   ＃这将打印在新秩序中的列。逗号告诉awk来使用OFS的字符集
    打印ID，名称，标志
} ＆GT;＆GT; AWK -f myscript.awk old.csv new.csv＆GT; formatted.csv

解决方案

这可能会为你工作：

 差异-W999 --side并排新老|
SED/^[^\\t]*\\t\\s*|\\t\\(.*\\)/{s//\\1 U /; B}; / ^ \\（[^ \\ t] * \\）\\ T * \\ S *＆LT; $ / {S // \\ 1 D /; b}; /^.*& GT; \\ t \\ / {S // \\ 1 N /; b};（* \\）D'
DTL，11111111,1111111111111111,11111111111，Y，Y，XX，XXū
DTL，22222222,2222222222222222,22222222222，Y，N，CC，CCū
DTL，33333333,3333333333333333,33333333333，Y，Y，DD，DDð
DTL，77777777,7777777777777777,77777777777，N，N，EE，EEñ

沿着相同的路线一个awk的解决方案：

 差异-W999 --side并排新老|
awk的'/ [|] [\\ t] / {分（$ 0，A，[|] [\\ t]）;打印[2]U}; / [\\ t] *＆LT; $ / {拆分（$ 0，A，[\\ t] *＆LT; $）;打印[1]D}; /＆GT; [\\ T] / {分（$ 0，A，＆GT; [\\ t ]）;打印[2]的N}
DTL，11111111,1111111111111111,11111111111，Y，Y，XX，XXū
DTL，22222222,2222222222222222,22222222222，Y，N，CC，CCū
DTL，33333333,3333333333333333,33333333333，Y，Y，DD，DDð
DTL，77777777,7777777777777777,77777777777，N，N，EE，EEñ

I have 2 CSV files that i need to compare and get the difference to a newly formatted file. The samples are given below.

OLD file

DTL,11111111,1111111111111111,11111111111,Y,N,xx,xx
DTL,22222222,2222222222222222,22222222222,Y,Y,cc,cc
DTL,33333333,3333333333333333,33333333333,Y,Y,dd,dd
DTL,44444444,4444444444444444,44444444444,Y,Y,ss,ss
DTL,55555555,5555555555555555,55555555555,Y,Y,qq,qq

NEW file

DTL,11111111,1111111111111111,11111111111,Y,Y,xx,xx
DTL,22222222,2222222222222222,22222222222,Y,N,cc,cc
DTL,44444444,4444444444444444,44444444444,Y,Y,ss,ss
DTL,55555555,5555555555555555,55555555555,Y,Y,qq,qq
DTL,77777777,7777777777777777,77777777777,N,N,ee,ee

Output file

I want to compare the old and new CSV files and to find the changes that has effected in the new file and UPDATE a FLAG to denote these changes

U - if the new file record is UPDATED D - if a record existing in the old file is deleted in the new file N - if a record existing in the new file is not available in the old file

the sample output file is this.

DTL,11111111,1111111111111111,11111111111,Y,Y,xx,xx U
DTL,22222222,2222222222222222,22222222222,Y,N,cc,cc U
DTL,33333333,3333333333333333,33333333333,Y,Y,dd,dd D
DTL,77777777,7777777777777777,77777777777,N,N,ee,ee N

I used diff command but it will repeat the UPDATED record too which is not I want.

 DTL,11111111,1111111111111111,11111111111,Y,N,xx,xx
 DTL,22222222,2222222222222222,22222222222,Y,Y,cc,cc
 DTL,33333333,3333333333333333,33333333333,Y,Y,dd,dd
  ---
 DTL,11111111,1111111111111111,11111111111,Y,Y,xx,xx
 DTL,22222222,2222222222222222,22222222222,Y,N,cc,cc
 5a5
 DTL,77777777,7777777777777777,77777777777,N,N,ee,ee

I used an AWK single line command to filter out my records as well

 awk 'NR==FNR{A[$1];next}!($1 in A)' FS=: old.csv new.csv

the problem with this is is doesnt get me the records only belonging to the OLD file. which is

DTL,33333333,3333333333333333,33333333333,Y,Y,dd,dd

I initiated an driven bash script as well to ahieve this but didnt find much help with a good example.

 myscript.awk

BEGIN { 
        FS = ","    # input field seperator 
        OFS = ","   # output field seperator
}

NR > 1 {
    #flag 
    # N - new record  D- Deleted U - Updated

id = $1
    name = $2
    flag = 'N'

   # This prints the columns in the new order. The commas tell Awk to use the     character set in OFS
    print id,name,flag
}

 >> awk -f  myscript.awk  old.csv new.csv > formatted.csv

解决方案

This might work for you:

diff  -W999 --side-by-side OLD NEW |
sed '/^[^\t]*\t\s*|\t\(.*\)/{s//\1 U/;b};/^\([^\t]*\)\t*\s*<$/{s//\1 D/;b};/^.*>\t\(.*\)/{s//\1 N/;b};d'
DTL,11111111,1111111111111111,11111111111,Y,Y,xx,xx U
DTL,22222222,2222222222222222,22222222222,Y,N,cc,cc U
DTL,33333333,3333333333333333,33333333333,Y,Y,dd,dd D
DTL,77777777,7777777777777777,77777777777,N,N,ee,ee N

an awk solution along the same lines:

diff -W999 --side-by-side OLD NEW |
awk '/[|][\t]/{split($0,a,"[|][\t]");print a[2]" U"};/[\t] *<$/{split($0,a,"[\t]* *<$");print a[1]" D"};/>[\t]/{split($0,a,">[\t]");print a[2]" N"}'
DTL,11111111,1111111111111111,11111111111,Y,Y,xx,xx U
DTL,22222222,2222222222222222,22222222222,Y,N,cc,cc U
DTL,33333333,3333333333333333,33333333333,Y,Y,dd,dd D
DTL,77777777,7777777777777777,77777777777,N,N,ee,ee N

这篇关于Linux的AWK比较两个CSV文件，并用标志创建一个新文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Linux的AWK比较两个CSV文件，并用标志创建一个新文件 [英] linux awk comparing two csv files and creating a new file with a flag

问题描述

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

Linux的AWK比较两个CSV文件，并用标志创建一个新文件 [英] linux awk comparing two csv files and creating a new file with a flag

问题描述

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭