awk + ​​bash:合并任意数量的文件 [英] awk + bash: combining arbitrary number of files

查看:94
本文介绍了awk + ​​bash:合并任意数量的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个脚本,该脚本可以接收许多布局相同但数据不同的数据文件,并将指定的数据列组合到一个新文件中,如下所示:

I have a script that takes a number of data files with identical layout but different data and combines a specified data column into a new file, like this:

gawk '{
        names[$1]= 1;
        data[$1,ARGIND]= $2
} END {
        for (i in names) print i"\t"data[i,1]"\t"data[i,2]"\t"data[i,3]
}' $1 $2 $3 > combined_data.txt

...在第一列中可以找到行ID,在第二列中可以找到有趣的数据.

... where the row IDs can be found in the first column, and the interesting data in the second column.

这很好用,但不适用于任意数量的文件.虽然我可以在最后一行中简单地添加$4 $5 ... $n到我认为需要的最大文件数量,并在上面的行中添加相等的n数量的"\t"data[i,4]"\t"data[i,5] ... "\t"data[i,n](这似乎甚至可以工作对于小于n的文件;在这种情况下,awk似乎忽略了n大于输入文件的数量),这似乎是一个丑陋"的解决方案.有没有办法使此脚本(或提供相同结果的东西)采用任意数量的输入文件?

This works nicely, but not for an arbitrary number of files. While I could simply add $4 $5 ... $n in the last line up to whatever maximum number of files I think I need, as well as add an equal n amount of "\t"data[i,4]"\t"data[i,5] ... "\t"data[i,n] in the line above that (which does seem to work even for files smaller than n; awk seems to disregard that n is larger than the number of input files in those cases), this seems like an "ugly" solution. Is there a way to make this script (or something that gives the same result) take an arbitrary number of input files?

或者更好的是,您可以在其中合并find来搜索子文件夹并找到符合某些条件的文件吗?

Or, even better, can you somehow incorporate a find in there, that searches through subfolders and finds files matching some criterium?

以下是一些示例数据:

file.1

A      554
B       13
C      634
D       84
E        9

file.2:

C      TRUE
E      TRUE
F      FALSE

预期输出:

A      554
B       13
C      634       TRUE
D       84
E        9       TRUE
F                FALSE

推荐答案

这可能就是您想要的(就像您的原始脚本一样,将GNU awk用于ARGIND):

This may be what you're looking for (uses GNU awk for ARGIND just like your original script):

$ cat tst.awk
BEGIN { OFS="\t" }
!seen[$1]++ { keys[++numKeys]=$1 }
{ vals[$1,ARGIND]=$2 }
END {
    for (rowNr=1; rowNr<=numKeys; rowNr++) {
        key = keys[rowNr]
        printf "%s%s", key, OFS
        for (colNr=1; colNr<=ARGIND; colNr++) {
            printf "%s%s", vals[key,colNr], (colNr<ARGIND?OFS:ORS)
        }
    }
}

$ awk -f tst.awk file1 file2
A       554
B       13
C       634     TRUE
D       84
E       9       TRUE
F               FALSE

如果您不关心行的输出顺序,则只需:

If you don't care about the order the rows are output in then all you need is:

BEGIN { OFS="\t" }
{ vals[$1,ARGIND]=$2; keys[$1] }
END {
    for (key in keys) {
        printf "%s%s", key, OFS
        for (colNr=1; colNr<=ARGIND; colNr++) {
            printf "%s%s", vals[key,colNr], (colNr<ARGIND?OFS:ORS)
        }
    }
}

这篇关于awk + ​​bash:合并任意数量的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆