Linux的AWK比较两个CSV文件,并用标志创建一个新文件 [英] linux awk comparing two csv files and creating a new file with a flag
问题描述
我有我需要比较并获得差到新格式的文件2的CSV文件。样品如下。
旧文件
DTL,11111111,1111111111111111,11111111111,Y,N,XX,XX
DTL,22222222,2222222222222222,22222222222,Y,Y,CC,CC
DTL,33333333,3333333333333333,33333333333,Y,Y,DD,DD
DTL,44444444,4444444444444444,44444444444,Y,Y,SS,SS
DTL,55555555,5555555555555555,55555555555,Y,Y,QQ,QQ
新文件
DTL,11111111,1111111111111111,11111111111,Y,Y,XX,XX
DTL,22222222,2222222222222222,22222222222,Y,N,CC,CC
DTL,44444444,4444444444444444,44444444444,Y,Y,SS,SS
DTL,55555555,5555555555555555,55555555555,Y,Y,QQ,QQ
DTL,77777777,7777777777777777,77777777777,N,N,EE,EE
输出文件
我想比较新旧CSV文件,并发现,在新的文件已经实行的变化和更新标志来表示这些变化
ü - 如果新文件记录被更新
ð - 如果存在于旧文件中的记录在新文件中被删除
N - 如果存在的话,在新文件中的记录是不可用的旧文件
样本输出文件是这样的。
DTL,11111111,1111111111111111,11111111111,Y,Y,XX,XXū
DTL,22222222,2222222222222222,22222222222,Y,N,CC,CCū
DTL,33333333,3333333333333333,33333333333,Y,Y,DD,DDð
DTL,77777777,7777777777777777,77777777777,N,N,EE,EEñ
我用diff命令,但它会重复更新的记录过这不是我想要的。
DTL,11111111,1111111111111111,11111111111,Y,N,XX,XX
DTL,22222222,2222222222222222,22222222222,Y,Y,CC,CC
DTL,33333333,3333333333333333,33333333333,Y,Y,DD,DD
---
DTL,11111111,1111111111111111,11111111111,Y,Y,XX,XX
DTL,22222222,2222222222222222,22222222222,Y,N,CC,CC
5A5
DTL,77777777,7777777777777777,77777777777,N,N,EE,EE
我用了一个AWK单行命令来过滤掉我的记录,以及
的awk'NR == FNR {a [$ 1];}旁(在A $ 1)!FS =:old.csv new.csv
这个是问题是犯规让我只属于旧文件中的记录。
这是
DTL,33333333,3333333333333333,33333333333,Y,Y,DD,DD
我发起了一个驱动bash脚本,以及要ahieve这一点,但没有找到一个很好的例子太多的帮助。
myscript.awk开始 {
FS =,#输入字段分隔符
OFS =,#输出字段分隔符
}NR> 1 {
#旗
#N - 新的记录已删除D-ü - 更新ID = $ 1
名称= $ 16
标志='N' #这将打印在新秩序中的列。逗号告诉awk来使用OFS的字符集
打印ID,名称,标志
} >> AWK -f myscript.awk old.csv new.csv> formatted.csv
这可能会为你工作:
差异-W999 --side并排新老|
SED/^[^\\t]*\\t\\s*|\\t\\(.*\\)/{s//\\1 U /; B}; / ^ \\([^ \\ t] * \\)\\ T * \\ S *< $ / {S // \\ 1 D /; b}; /^.*& GT; \\ t \\ / {S // \\ 1 N /; b};(* \\)D'
DTL,11111111,1111111111111111,11111111111,Y,Y,XX,XXū
DTL,22222222,2222222222222222,22222222222,Y,N,CC,CCū
DTL,33333333,3333333333333333,33333333333,Y,Y,DD,DDð
DTL,77777777,7777777777777777,77777777777,N,N,EE,EEñ
沿着相同的路线一个awk的解决方案:
差异-W999 --side并排新老|
awk的'/ [|] [\\ t] / {分($ 0,A,[|] [\\ t]);打印[2]U}; / [\\ t] *< $ / {拆分($ 0,A,[\\ t] *< $);打印[1]D}; /> [\\ T] / {分($ 0,A,> [\\ t ]);打印[2]的N}
DTL,11111111,1111111111111111,11111111111,Y,Y,XX,XXū
DTL,22222222,2222222222222222,22222222222,Y,N,CC,CCū
DTL,33333333,3333333333333333,33333333333,Y,Y,DD,DDð
DTL,77777777,7777777777777777,77777777777,N,N,EE,EEñ
I have 2 CSV files that i need to compare and get the difference to a newly formatted file. The samples are given below.
OLD file
DTL,11111111,1111111111111111,11111111111,Y,N,xx,xx
DTL,22222222,2222222222222222,22222222222,Y,Y,cc,cc
DTL,33333333,3333333333333333,33333333333,Y,Y,dd,dd
DTL,44444444,4444444444444444,44444444444,Y,Y,ss,ss
DTL,55555555,5555555555555555,55555555555,Y,Y,qq,qq
NEW file
DTL,11111111,1111111111111111,11111111111,Y,Y,xx,xx
DTL,22222222,2222222222222222,22222222222,Y,N,cc,cc
DTL,44444444,4444444444444444,44444444444,Y,Y,ss,ss
DTL,55555555,5555555555555555,55555555555,Y,Y,qq,qq
DTL,77777777,7777777777777777,77777777777,N,N,ee,ee
Output file
I want to compare the old and new CSV files and to find the changes that has effected in the new file and UPDATE a FLAG to denote these changes
U - if the new file record is UPDATED D - if a record existing in the old file is deleted in the new file N - if a record existing in the new file is not available in the old file
the sample output file is this.
DTL,11111111,1111111111111111,11111111111,Y,Y,xx,xx U
DTL,22222222,2222222222222222,22222222222,Y,N,cc,cc U
DTL,33333333,3333333333333333,33333333333,Y,Y,dd,dd D
DTL,77777777,7777777777777777,77777777777,N,N,ee,ee N
I used diff command but it will repeat the UPDATED record too which is not I want.
DTL,11111111,1111111111111111,11111111111,Y,N,xx,xx
DTL,22222222,2222222222222222,22222222222,Y,Y,cc,cc
DTL,33333333,3333333333333333,33333333333,Y,Y,dd,dd
---
DTL,11111111,1111111111111111,11111111111,Y,Y,xx,xx
DTL,22222222,2222222222222222,22222222222,Y,N,cc,cc
5a5
DTL,77777777,7777777777777777,77777777777,N,N,ee,ee
I used an AWK single line command to filter out my records as well
awk 'NR==FNR{A[$1];next}!($1 in A)' FS=: old.csv new.csv
the problem with this is is doesnt get me the records only belonging to the OLD file. which is
DTL,33333333,3333333333333333,33333333333,Y,Y,dd,dd
I initiated an driven bash script as well to ahieve this but didnt find much help with a good example.
myscript.awk
BEGIN {
FS = "," # input field seperator
OFS = "," # output field seperator
}
NR > 1 {
#flag
# N - new record D- Deleted U - Updated
id = $1
name = $2
flag = 'N'
# This prints the columns in the new order. The commas tell Awk to use the character set in OFS
print id,name,flag
}
>> awk -f myscript.awk old.csv new.csv > formatted.csv
This might work for you:
diff -W999 --side-by-side OLD NEW |
sed '/^[^\t]*\t\s*|\t\(.*\)/{s//\1 U/;b};/^\([^\t]*\)\t*\s*<$/{s//\1 D/;b};/^.*>\t\(.*\)/{s//\1 N/;b};d'
DTL,11111111,1111111111111111,11111111111,Y,Y,xx,xx U
DTL,22222222,2222222222222222,22222222222,Y,N,cc,cc U
DTL,33333333,3333333333333333,33333333333,Y,Y,dd,dd D
DTL,77777777,7777777777777777,77777777777,N,N,ee,ee N
an awk solution along the same lines:
diff -W999 --side-by-side OLD NEW |
awk '/[|][\t]/{split($0,a,"[|][\t]");print a[2]" U"};/[\t] *<$/{split($0,a,"[\t]* *<$");print a[1]" D"};/>[\t]/{split($0,a,">[\t]");print a[2]" N"}'
DTL,11111111,1111111111111111,11111111111,Y,Y,xx,xx U
DTL,22222222,2222222222222222,22222222222,Y,N,cc,cc U
DTL,33333333,3333333333333333,33333333333,Y,Y,dd,dd D
DTL,77777777,7777777777777777,77777777777,N,N,ee,ee N
这篇关于Linux的AWK比较两个CSV文件,并用标志创建一个新文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!