根据第一列将两个文件合并为一个文件 [英] Merging two files into one based on the first column
问题描述
我有两个文件,两个文件的格式相同-例如,两列都包含一个数字,
I have two files, both in the same format -- two columns both containing a number, for example:
文件1
1.00 99
2.00 343
3.00 34
...
10.00 343
文件2
1.00 0.4
2.00 0.5
3.00 0.34
...
10.00 0.9
我想生成以下文件(使用awk,bash perl):
and i want to generate the following file (using, awk, bash perl):
1.00 99 0.4
2.00 343 0.5
3.00 34 0.34
...
10.00 343 0.9
谢谢
推荐答案
join file1 file2
假定文件在连接字段上排序.如果不是,则可以执行以下操作:
Which assumes that the files are sorted on the join field. If they are not, you can do this:
join <(sort -V file1) <(sort -V file2)
这是AWK版本(sort
补偿了AWK的不确定数组排序):
Here's an AWK version (the sort
compensates for AWK's non-deterministic array ordering):
awk '{a[$1]=a[$1] FS $2} END {for (i in a) print i a[i]}' file1 file2 | sort -V
它似乎比Perl答案更短,更易读.
It seems shorter and more readable than the Perl answer.
在gawk
4中,您可以设置数组的遍历顺序:
In gawk
4, you can set the array traversal order:
awk 'BEGIN {PROCINFO["sorted_in"] = "@ind_num_asc"} {a[$1]=a[$1] FS $2} END {for (i in a) print i a[i]}' file1 file2
,您将不必使用sort
实用程序. @ind_num_asc
是索引数字升序.请参见控制数组遍历和数组排序和将预定义的阵列扫描顺序与gawk一起使用.
and you won't have to use the sort
utility. @ind_num_asc
is Index Numeric Ascending. See Controlling Array Traversal and Array Sorting and Using Predefined Array Scanning Orders with gawk.
请注意,上述sort
命令中的-V
(--version-sort
)需要coreutils 7.0或更高版本的GNU sort
.感谢@simlev指出应该使用它.
Note that -V
(--version-sort
) in the sort
commands above requires GNU sort
from coreutils 7.0 or later. Thanks for @simlev pointing out that it should be used if available.
这篇关于根据第一列将两个文件合并为一个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!