庆典通过匹配列合并文件 [英] bash merge files by matching columns
问题描述
我有两个文件:
File1
12 abc
34 cde
42 dfg
11 df
9 e
File2
23 abc
24 gjr
12 dfg
8 df
我想通过柱合并文件列(如果塔2是相同的)为这样的输出:
I want to merge files column by column (if column 2 is the same) for the output like this:
File1 File2
12 23 abc
42 12 dfg
11 8 df
34 NA cde
9 NA e
NA 24 gjr
我怎样才能做到这一点?
How can I do this?
我想它是这样的:
cat File* >> tmp; sort tmp | uniq -c | awk '{print $2}' > column2; for i in
$(cat column2); do grep -w "$i" File*
但是,这是我在哪里卡住了...结果
不知道怎么greping我应该列&放文件合并后列;写NA值在哪里丢失。
But this is where I am stuck...
Don't know how after greping I should combine files column by column & write NA where value is missing.
希望有人能够帮助我。
Hope someone could help me with this.
推荐答案
由于我是用庆典
3.2运行为 SH $ C测试$ C>(它没有进程替换为
SH
),我用了两个临时文件来准备使用数据与加入
:
Since I was testing with bash
3.2 running as sh
(which does not have process substitution as sh
), I used two temporary files to get the data ready for use with join
:
$ sort -k2b File2 > f2.sort
$ sort -k2b File1 > f1.sort
$ cat f1.sort
12 abc
34 cde
11 df
42 dfg
9 e
$ cat f2.sort
23 abc
8 df
12 dfg
24 gjr
$ join -1 2 -2 2 -o 1.1,2.1,0 -a 1 -a 2 -e NA f1.sort f2.sort
12 23 abc
34 NA cde
11 8 df
42 12 dfg
9 NA e
NA 24 gjr
$
使用过程中替换,你可以写:
With process substitution, you could write:
join -1 2 -2 2 -o 1.1,2.1,0 -a 1 -a 2 -e NA <(sort -k2b File1) <(sort -k2b File2)
如果你想以不同格式的数据,使用 AWK
来后处理的输出:
If you want the data formatted differently, use awk
to post-process the output:
$ join -1 2 -2 2 -o 1.1,2.1,0 -a 1 -a 2 -e NA f1.sort f2.sort |
> awk '{ printf "%-5s %-5s %s\n", $1, $2, $3 }'
12 23 abc
34 NA cde
11 8 df
42 12 dfg
9 NA e
NA 24 gjr
$
这篇关于庆典通过匹配列合并文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!