根据第一列将两个文件合并为一个文件 [英] Merging two files into one based on the first column

查看:202
本文介绍了根据第一列将两个文件合并为一个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个文件,两个文件的格式相同-例如,两列都包含一个数字,

I have two files, both in the same format -- two columns both containing a number, for example:

文件1

1.00    99
2.00    343
3.00    34
...
10.00   343

文件2

1.00    0.4
2.00    0.5
3.00    0.34
...
10.00   0.9

我想生成以下文件(使用awk,bash perl):

and i want to generate the following file (using, awk, bash perl):

1.00    99      0.4 
2.00    343     0.5      
3.00    34      0.34
...
10.00   343     0.9

谢谢

推荐答案

join file1 file2

假定文件在连接字段上排序.如果不是,则可以执行以下操作:

Which assumes that the files are sorted on the join field. If they are not, you can do this:

join <(sort -V file1) <(sort -V file2)

这是AWK版本(sort补偿了AWK的不确定数组排序):

Here's an AWK version (the sort compensates for AWK's non-deterministic array ordering):

awk '{a[$1]=a[$1] FS $2} END {for (i in a) print i a[i]}' file1 file2 | sort -V

它似乎比Perl答案更短,更易读.

It seems shorter and more readable than the Perl answer.

gawk 4中,您可以设置数组的遍历顺序:

In gawk 4, you can set the array traversal order:

awk 'BEGIN {PROCINFO["sorted_in"] = "@ind_num_asc"} {a[$1]=a[$1] FS $2} END {for (i in a) print i a[i]}' file1 file2

,您将不必使用sort实用程序. @ind_num_asc是索引数字升序.请参见控制数组遍历和数组排序将预定义的阵列扫描顺序与gawk一起使用.

and you won't have to use the sort utility. @ind_num_asc is Index Numeric Ascending. See Controlling Array Traversal and Array Sorting and Using Predefined Array Scanning Orders with gawk.

请注意,上述sort命令中的-V(--version-sort)需要coreutils 7.0或更高版本的GNU sort.感谢@simlev指出应该使用它.

Note that -V (--version-sort) in the sort commands above requires GNU sort from coreutils 7.0 or later. Thanks for @simlev pointing out that it should be used if available.

这篇关于根据第一列将两个文件合并为一个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆