如何使用AWK合并两个文件? [英] How to merge two files using AWK?

查看:49
本文介绍了如何使用AWK合并两个文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

文件 1 有 5 个字段 A B C D E,其中字段 A 是一个整数值

File 1 has 5 fields A B C D E, with field A is an integer-valued

文件 2 有 3 个字段 A F G

File 2 has 3 fields A F G

文件1的行数远大于文件2的行数(20^6到5000)

The number of rows in File 1 is much bigger than that of File 2 (20^6 to 5000)

文件 1 中 A 的所有条目都出现在文件 2 中的字段 A 中

All the entries of A in File 1 appeared in field A in File 2

我喜欢按字段A合并两个文件并携带F和G

I like to merge the two files by field A and carry F and G

期望的输出是 A B C D E F G

Desired output is A B C D E F G

例子

文件 1

 A     B     C    D    E
4050 S00001 31228 3286 0
4050 S00012 31227 4251 0
4049 S00001 28342 3021 1
4048 S00001 46578 4210 0
4048 S00113 31221 4250 0
4047 S00122 31225 4249 0
4046 S00344 31322 4000 1

文件 2

A     F    G   
4050 12.1 23.6
4049 14.4 47.8   
4048 23.2 43.9
4047 45.5 21.6

期望的输出

A    B      C      D   E F    G
4050 S00001 31228 3286 0 12.1 23.6
4050 S00012 31227 4251 0 12.1 23.6
4049 S00001 28342 3021 1 14.4 47.8
4048 S00001 46578 4210 0 23.2 43.9
4048 S00113 31221 4250 0 23.2 43.9
4047 S00122 31225 4249 0 45.5 21.6

推荐答案

$ awk 'FNR==NR{a[$1]=$2 FS $3;next}{ print $0, a[$1]}' file2 file1
4050 S00001 31228 3286 0 12.1 23.6
4050 S00012 31227 4251 0 12.1 23.6
4049 S00001 28342 3021 1 14.4 47.8
4048 S00001 46578 4210 0 23.2 43.9
4048 S00113 31221 4250 0 23.2 43.9
4047 S00122 31225 4249 0 45.5 21.6
4046 S00344 31322 4000 1

说明:(部分基于另一个问题.虽然有点晚了.)

Explanation: (Partly based on another question. A bit late though.)

FNR 是指当前文件中的记录号(通常是行号),NR 是指总记录号.运算符 == 是一个比较运算符,当两个周围的操作数相等时返回真.所以FNR==NR{commands} 表示括号内的命令只在处理第一个文件(file2 现在)时执行.

FNR refers to the record number (typically the line number) in the current file and NR refers to the total record number. The operator == is a comparison operator, which returns true when the two surrounding operands are equal. So FNR==NR{commands} means that the commands inside the brackets only executed while processing the first file (file2 now).

FS 指的是字段分隔符,$1$2 等是一行中的第一个、第二个等字段.a[$1]=$2 FS $3 表示一个字典(/array)(名为a)填充了$1键和$2 FS $3 价值.

FS refers to the field separator and $1, $2 etc. are the 1st, 2nd etc. fields in a line. a[$1]=$2 FS $3 means that a dictionary(/array) (named a) is filled with $1 key and $2 FS $3 value.

; 分隔命令

next 意味着对当前行忽略任何其他命令.(处理在下一行继续.)

next means that any other commands are ignored for the current line. (The processing continues on the next line.)

$0 是整行

{print $0, a[$1]} 只是打印出整行和 a[$1] 的值(如果 $1在字典中,否则只打印 $0).由于 FNR==NR{...;next},现在它只对第二个文件(file1 现在)执行.

{print $0, a[$1]} simply prints out the whole line and the value of a[$1] (if $1 is in the dictionary, otherwise only $0 is printed). Now it is only executed for the 2nd file (file1 now), because of FNR==NR{...;next}.

这篇关于如何使用AWK合并两个文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆