比较两个文本文件打印结果在新的头 [英] Comparing two text files printing result in new header

查看:243
本文介绍了比较两个文本文件打印结果在新的头的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好吧,我会reupdate这种

我有2个文件 - FILE1.TXT,FILE2.TXT

文件1是基础模板

文件2是有状态的结果。

FILE1.TXT

  N1,N2,N3,N4,N5,N6
XX,ZZ,XC,EE,RR,BB
XC,CF,FG,RG,GH,GH

FILE2.TXT

  DF,GH,MH,FR,FG,GH,NA
XX,ZZ,XC,EE,RR,BB,OK

下面的命令在这两个文件,​​如果匹配,那么它在文件2检索第七单元格的值,并在FILE1.TXT追加为新标题最后一列列的比较1。

如果没有找到NA被更新。

使用的命令:

 的awk -F
  FNR == {NR一个[$ 1] = $ 7;下一个 }
  FNR == 1 {打印$ 0; LEN =长度($ 0);下一个 }
  {
    printf的$ 0个
    CONT =((在$ 1),A [$ 1]:NA)
    为(ⅰ=长度($ 0)+1; I&下; = LEN长度(续);我+ +)
      printf上
    打印续
  }
FILE2.TXT FILE1.TXT> TMP和放大器;&安培;

第一天 - 上面的命令运行后

  N1,N2,N3,N4,N5,N6,D1
XX,ZZ,XC,EE,RR,BB,OK
XC,CF,FG,RG,GH,GH,NA

2日 - 上面的命令运行后

  N1,N2,N3,N4,N5,N6,D1,D2
XX,ZZ,XC,EE,RR,BB,OK,OK
XC,CF,FG,RG,GH,GH,NA,NA

在第三天我在文件1插入新行底部

  N1,N2,N3,N4,N5,N6,D1,D2
XX,ZZ,XC,EE,RR,BB,OK,OK
XC,CF,FG,RG,GH,GH,NA,NA
DM,LC,VF,GR,GH,ES

现在,当我在上面的命令第三天运行,我需要这样的输出

  N1,N2,N3,N4,N5,N6,D1,D2,D3的
XX,ZZ,XC,EE,RR,BB,OK,OK,OK
XC,CF,FG,RG,GH,GH,NA,NA,NA
DM,LC,VF,GR,GH,ES ,,, NA


解决方案

AWK 脚本似乎做的工作:

 的awk -F
BEGIN {OFS = FS}
FNR == {NR一个[$ 1] = $ 7;下一个 }
FNR == 1 {N1 = N = NF + 1; $ N =D(N-6);打印;下一个 }
        {$ N1 =(在一个$ 1)?一[$ 1]:NA;打印}
FILE2.TXT FILE1.TXT

OFS是输出字段分隔符; FS是(输入)字段分隔符。两者都设置为,FS由转让 -F 选项和OFS。这使得它容易得到输出领域的正确数量。的 AWK 字符串连接无操作员,由例证D(N-6)是稍显怪异;你习惯了它,到一个点,但它仍然看起来有点古怪。

示例

这个例子运行使用程序嗷嗷具有简介:

 流文件CMD参数... args ...

据$ P $由具有pserves文件的内容 CMD参数... args ... 写入到一个临时文件,如果该命令执行成功(退出状态0)和输出不为空,则preserves原始的拷贝,忽略了一些信号,然后复制到原始临时输出和清理。这是相当有用 - code在底部。这是我怎么做我的测试。很显然,我可以使用 TMP = $(mktemp的tmp.XXXXXX); AWK ... FILE1.TXT> $ TMP; MV $ TMP FILE1.TXT 来替代,或者类似的规定。然而,因为我有嗷嗷,我用它。

  $猫FILE1.TXT
N1,N2,N3,N4,N5,N6
XX,ZZ,XC,EE,RR,BB
XC,CF,FG,RG,GH,GH
$嗷嗷FILE1.TXT的awk -F
> BEGIN {OFS = FS}
> FNR == {NR一个[$ 1] = $ 7;下一个 }
> FNR == 1 {N1 = N = NF + 1; $ N =D(N-6);打印;下一个 }
> {$ N1 =(在一个$ 1)?一[$ 1]:NA;打印}
> FILE2.TXT FILE1.TXT
$猫FILE1.TXT
N1,N2,N3,N4,N5,N6,D1
XX,ZZ,XC,EE,RR,BB,OK
XC,CF,FG,RG,GH,GH,NA
$嗷嗷FILE1.TXT的awk -F
> BEGIN {OFS = FS}
> FNR == {NR一个[$ 1] = $ 7;下一个 }
> FNR == 1 {N1 = N = NF + 1; $ N =D(N-6);打印;下一个 }
> {$ N1 =(在一个$ 1)?一[$ 1]:NA;打印}
> FILE2.TXT FILE1.TXT
$猫FILE1.TXT
N1,N2,N3,N4,N5,N6,D1,D2
XX,ZZ,XC,EE,RR,BB,OK,OK
XC,CF,FG,RG,GH,GH,NA,NA
$回声DM,LC,VF,GR,GH,ES>> FILE1.TXT
$嗷嗷FILE1.TXT的awk -F
> BEGIN {OFS = FS}
> FNR == {NR一个[$ 1] = $ 7;下一个 }
> FNR == 1 {N1 = N = NF + 1; $ N =D(N-6);打印;下一个 }
> {$ N1 =(在一个$ 1)?一[$ 1]:NA;打印}
> FILE2.TXT FILE1.TXT
$猫FILE1.TXT
N1,N2,N3,N4,N5,N6,D1,D2,D3的
XX,ZZ,XC,EE,RR,BB,OK,OK,OK
XC,CF,FG,RG,GH,GH,NA,NA,NA
DM,LC,VF,GR,GH,ES ,,, NA
$

请注意,由于分配给 $ I I 比NF较大,NF增加,任何缺少的字段创建为空字段。

该脚本的第一个工作版本有更复杂的逻辑,用循环创建空字段,但因为 AWK 会自动完成,脚本大大简化。你会经常发现有一点时间和精力,最初的解决方案可以简化和清理。

不过,这大概也与此有关指出,这code是非常信任。它并不能保证有恰好 FILE2.TXT 7场。它不检查,在 FILE1.TXT 每一行都有是同一个号码作为文件的第一行中的字段或刚好是6个领域。如果你提供的数据扭曲,你会得到扭曲的数据出来 - 古老的 GIGO 原则:的垃圾进,垃圾出

嗷嗷

 :@(#)$编号:ow.sh,V 1.6 2005年6月30日18时14分08秒jleffler精通$

#覆盖文件
#来自:Kernighan和派克的UNIX编程环境
#修订:删除PATH设置;处理与空白的文件名。案例$#在
0 | 1)回声用法:$ 0文件命令[参数]1>和2
    出口1 ;;
ESAC文件=$ 1
转移
新= $ {TMPDIR: - / tmp目录}。/ ovrwr $$ 1
老= $ {TMPDIR: - / tmp目录}。/ ovrwr $$ 2陷阱RM -f'$新''$旧; 1号出口0 1 2 15如果$ @>中新的$
然后
    CP$文件,$老
    陷阱1 2 15
    CP$新的$文件
    RM -f$新,老$
    陷阱0
    退出0
其他
    回声$ 0:$ 1失败 - $文件不变的1>和2
    RM -f$新,老$
    陷阱0
    1号出口
科幻


添加日期,而不是​​DN至抽穗期


  

有没有可能是 AWK 在标题中,而不是打印日期的 D1 的?


如果你想添加当前日期,你有两个主要选择。一,使用GNU GAWK(通常也安装为AWK),则时间函数使它容易。如果做不到这一点,的awk -v日期= $(日期+'%Y-%M-%D')-F,... 的系统命令日期格式的值,​​并通入awk脚本可变日期,然后你就可以打印你想要它。如果你想通过在任意的日期,那么第二个机构是使用所述一个

  awk的-F,-v日期= $(日期+'%Y-%M-%D')
BEGIN {OFS = FS}
FNR == {NR一个[$ 1] = $ 7;下一个 }
FNR == 1 {N1 = N = NF + 1; $ N =日期;打印;下一个 }
        {$ N1 =(在一个$ 1)?一[$ 1]:NA;打印}
FILE2.TXT FILE1.TXT

这力量今天的日期到命令。你也可以做的事情前瞻性或回顾性,如:

  TMP = $(mktemp的coladd.XXXXXXXXX)
陷阱RM -f $ TMP;出口10 1 2 3 13 15在$ DD(SEQ 1 31)

    awk的-F,-v日期=2014-12- $ DD'
    BEGIN {OFS = FS}
    FNR == {NR一个[$ 1] = $ 7;下一个 }
    FNR == 1 {N1 = N = NF + 1; $ N =日期;打印;下一个 }
            {$ N1 =(在一个$ 1)?一[$ 1]:NA;打印}
    FILE2.TXT FILE1.TXT> $ TMP
    MV $ TMP FILE1.TXT
DONE

鉴于这种额外的灵活性,我会建议使用基于GNU的内部日期操作功能的外部定义的日期,但的因人而异

Okay i will reupdate this

I have 2 files - File1.txt , File2.txt

File1 is base template

File2 is having status result

file1.txt

N1,N2,N3,N4,N5,N6
XX,ZZ,XC,EE,RR,BB
XC,CF,FG,RG,GH,GH

file2.txt

DF,GH,MH,FR,FG,GH,NA
XX,ZZ,XC,EE,RR,BB,OK

Below command compares column 1 in both files if it matches then it retrieves the value from 7th cell in file2 and appends in file1.txt as last column with new header.

if not found NA is updated .

Command used :

awk -F  '
  FNR==NR { a[$1]=$7; next }
  FNR==1  { print $0; len=length($0); next }
  {
    printf $0
    cont=(($1 in a) ? ","a[$1] : ",NA")
    for ( i=length($0)+1; i<=len-length(cont); i++)
      printf " " 
    print cont
  }
'  file2.txt file1.txt > tmp &&

Day1 - After running above command

N1,N2,N3,N4,N5,N6,D1
XX,ZZ,XC,EE,RR,BB,OK
XC,CF,FG,RG,GH,GH,NA

Day 2 - After running above command

N1,N2,N3,N4,N5,N6,D1,D2
XX,ZZ,XC,EE,RR,BB,OK,OK
XC,CF,FG,RG,GH,GH,NA,NA

At Day3 i inserted a new row in File1 at bottom

N1,N2,N3,N4,N5,N6,D1,D2
XX,ZZ,XC,EE,RR,BB,OK,OK
XC,CF,FG,RG,GH,GH,NA,NA
DM,LC,VF,GR,GH,ES

now when i run above command on Day3 , i need output like this

N1,N2,N3,N4,N5,N6,D1,D2,D3
XX,ZZ,XC,EE,RR,BB,OK,OK,OK
XC,CF,FG,RG,GH,GH,NA,NA,NA
DM,LC,VF,GR,GH,ES,,,NA

解决方案

This awk script seems to do the job:

awk -F, '
BEGIN   { OFS = FS }
FNR==NR { a[$1] = $7; next }
FNR==1  { n1 = n = NF + 1; $n = "D" (n-6); print; next }
        { $n1 = ($1 in a) ? a[$1] : "NA"; print }
' file2.txt file1.txt

OFS is the output field separator; FS is the (input) field separator. Both are set to ,, FS by the -F option and OFS by the assignment. This makes it easy to get the correct number of fields in the output. awk's string concatenation with no operator, exemplified by "D" (n-6) is slightly weird; you get used to it, up to a point, but it still looks a little odd.

Example

The example run uses a program ow that has the synopsis:

ow file cmd …args…

It preserves the contents of the file by having the cmd …args… write to a temporary file, and if the command succeeds (exit status 0) and the output is not empty, it then preserves a copy of the original, ignores a number of signals, and then copies the temporary output over the original and cleans up. It is rather useful — code at the bottom. This is how I did my test. Clearly, I could use tmp=$(mktemp tmp.XXXXXX); awk … file1.txt > $tmp; mv $tmp file1.txt instead, or something along those lines. However, since I have ow, I use it.

$ cat file1.txt
N1,N2,N3,N4,N5,N6
XX,ZZ,XC,EE,RR,BB
XC,CF,FG,RG,GH,GH
$ ow file1.txt awk -F, '
> BEGIN   { OFS = FS }
> FNR==NR { a[$1] = $7; next }
> FNR==1  { n1 = n = NF + 1; $n = "D" (n-6); print; next }
>         { $n1 = ($1 in a) ? a[$1] : "NA"; print }
> ' file2.txt file1.txt
$ cat file1.txt
N1,N2,N3,N4,N5,N6,D1
XX,ZZ,XC,EE,RR,BB,OK
XC,CF,FG,RG,GH,GH,NA
$ ow file1.txt awk -F, '
> BEGIN   { OFS = FS }
> FNR==NR { a[$1] = $7; next }
> FNR==1  { n1 = n = NF + 1; $n = "D" (n-6); print; next }
>         { $n1 = ($1 in a) ? a[$1] : "NA"; print }
> ' file2.txt file1.txt
$ cat file1.txt
N1,N2,N3,N4,N5,N6,D1,D2
XX,ZZ,XC,EE,RR,BB,OK,OK
XC,CF,FG,RG,GH,GH,NA,NA
$ echo DM,LC,VF,GR,GH,ES >> file1.txt
$ ow file1.txt awk -F, '
> BEGIN   { OFS = FS }
> FNR==NR { a[$1] = $7; next }
> FNR==1  { n1 = n = NF + 1; $n = "D" (n-6); print; next }
>         { $n1 = ($1 in a) ? a[$1] : "NA"; print }
> ' file2.txt file1.txt
$ cat file1.txt
N1,N2,N3,N4,N5,N6,D1,D2,D3
XX,ZZ,XC,EE,RR,BB,OK,OK,OK
XC,CF,FG,RG,GH,GH,NA,NA,NA
DM,LC,VF,GR,GH,ES,,,NA
$

Note that as you assign to $i and i is larger than NF was, NF increases, and any missing fields are created as empty fields.

The first working version of this script had more complex logic, with a loop creating the empty fields, but since awk will do that automatically, the script simplified considerably. You'll often find that with a bit of time and care, initial solutions can be simplified and cleaned up.

However, it is probably also relevant to point out that this code is very trusting. It doesn't ensure that there are exactly 7 fields in file2.txt. It doesn't check that each line in file1.txt has either the same number of fields as the first line in the file or exactly 6 fields. If you supply screwy data in, you get screwy data out — the age-old GIGO principle: Garbage In, Garbage Out.

ow

:   "@(#)$Id: ow.sh,v 1.6 2005/06/30 18:14:08 jleffler Exp $"
#
#   Overwrite file
#   From: The UNIX Programming Environment by Kernighan and Pike
#   Amended: remove PATH setting; handle file names with blanks.

case $# in
0|1)    echo "Usage: $0 file command [arguments]" 1>&2
    exit 1;;
esac

file="$1"
shift
new=${TMPDIR:-/tmp}/ovrwr.$$.1
old=${TMPDIR:-/tmp}/ovrwr.$$.2

trap "rm -f '$new' '$old' ; exit 1" 0 1 2 15

if "$@" >"$new"
then
    cp "$file" "$old"
    trap "" 1 2 15
    cp "$new" "$file"
    rm -f "$new" "$old"
    trap 0
    exit 0
else
    echo "$0: $1 failed - $file unchanged" 1>&2
    rm -f "$new" "$old"
    trap 0
    exit 1
fi


Adding date instead of Dn to heading

Is it possible that awk can print a date in the header instead of D1?

If you want the current date added, you have two main options. One, using GNU gawk (often also installed as awk), then the time functions make it easy. Failing that, awk -v date=$(date +'%Y-%m-%d') -F, … has the system command date format a value and pass it into the awk script as variable date, which you can then print where you want it. If you want arbitrary dates passed in, then the second mechanism is the one to use.

awk -F, -v date=$(date +'%Y-%m-%d') '
BEGIN   { OFS = FS }
FNR==NR { a[$1] = $7; next }
FNR==1  { n1 = n = NF + 1; $n = date; print; next }
        { $n1 = ($1 in a) ? a[$1] : "NA"; print }
' file2.txt file1.txt

That forces today's date into the command. You can also do things prospectively or retrospectively, such as:

tmp=$(mktemp coladd.XXXXXXXXX)
trap "rm -f $tmp; exit 1" 0 1 2 3 13 15

for dd in $(seq 1 31)
do
    awk -F, -v date="2014-12-$dd" '
    BEGIN   { OFS = FS }
    FNR==NR { a[$1] = $7; next }
    FNR==1  { n1 = n = NF + 1; $n = date; print; next }
            { $n1 = ($1 in a) ? a[$1] : "NA"; print }
    ' file2.txt file1.txt > $tmp
    mv $tmp file1.txt
done

Given this extra flexibility, I'd recommend using the externally-defined date over GNU's internal date manipulating functions, but YMMV.

这篇关于比较两个文本文件打印结果在新的头的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆