匹配两个文件的第一列中的值,并将匹配行加入到新文件中 [英] match values in first column of two files and join the matching lines in a new file

查看:116
本文介绍了匹配两个文件的第一列中的值,并将匹配行加入到新文件中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要使用file1.txt中第1列($ 1)中的字符串与file2.txt中第1列($ 1)中的字符串进行匹配。然后我想加入新文件中匹配的行。

  cat file1.txt 
1050008 5.156725968 8.404038296 124.9198605 3.23E-21 2.33E-17 38.57865782
3310747 5.631470026 8.581936875 124.6039122 3.34E-21 2.33E-17 38.55204806
5910451 4.900364671 8.455329195 124.5720603 3.35E-21 2.33E-17 38.54935989
730156 5.565210738 8.48792701 122.2168789 4.28E-21 2.33E-17 38.34773989

cat file2.txt
4230037 ILMN控件ILMN_Controls ERCC-00071 ILMN_333646 ERCC-00071 ERCC-00071
1050008 ILMN控件ILMN_Controls ERCC-00009 ILMN_333584 ERCC-00009 ERCC-00009
5260356 ILMN对照ILMN_Controls ERCC-00053 ILMN_333628 ERCC-00053 ERCC-00053
3310747 ILMN控制ILMN_Controls ERCC-00144 ILMN_333719 ERCC-00144 ERCC-00144
5910451 ILMN控制ILMN_Controls ERCC-00003 ILMN_333578 ERCC-00003 ERCC-00003
1710435 ILMN控件ILMN_Controls ERCC-00138 ILMN_333713 ERCC-00138 ERCC-00138
1400612 ILMN控件ILMN_Control s ERCC-00084 ILMN_333659 ERCC-00084 ERCC-00084
730156 ILMN控制ILMN_Controls ERCC-00017 ILMN_333592 ERCC-00017 ERCC-00017

我希望输出文件如下所示:

  out.txt 
1050008 5.156725968 8.404038296 124.9198605 3.23E-21 2.33E-17 38.57865782 1050008 ILMN控制ILMN_Controls ERCC-00009 ILMN_333584 ERCC-00009 ERCC-00009
3310747 5.631470026 8.581936875 124.6039122 3.34E-21 2.33E-17 38.55204806 3310747 ILMN控制ILMN_Controls ERCC- 00144 ILMN_333719 ERCC-00144 ERCC-00144
5910451 4.900364671 8.455329195 124.5720603 3.35E-21 2.33E-17 38.54935989 5910451 ILMN控制ILMN_Controls ERCC-00003 ILMN_333578 ERCC-00003 ERCC-00003
730156 5.565210738 8.48792701 122.2168789 4.28E- 21 2.33E-17 38.34773989 730156 ILMN控制ILMN_Controls ERCC-00017 ILMN_333592 ERCC-00017 ERCC-00017

文件是标签在bash(OSX)中有这样的设置 / p>

如果您有解决方案,我将非常感谢您是否可以简要解释步骤,因为我对此很陌生。

解决方案

  awk'BEGIN {
FS = OFS =\ t
}
NR == FNR {
#读取第一个文件时
#将其记录存储在数组中f
f [$ 1] = $ 0
next
}
$ 1 in f {
#当找到匹配时
#打印所有值
print f [$ 1],$ 0
}'file1 file2


I need to find matches with the string in column 1 ($1) in file1.txt with the string in column 1 ($1) in file2.txt. Then I want to join the lines where there was a match in a new file.

cat file1.txt
1050008 5.156725968 8.404038296 124.9198605 3.23E-21    2.33E-17    38.57865782
3310747 5.631470026 8.581936875 124.6039122 3.34E-21    2.33E-17    38.55204806
5910451 4.900364671 8.455329195 124.5720603 3.35E-21    2.33E-17    38.54935989
730156  5.565210738 8.48792701  122.2168789 4.28E-21    2.33E-17    38.34773989

cat file2.txt
4230037 ILMN Controls   ILMN_Controls   ERCC-00071  ILMN_333646 ERCC-00071  ERCC-00071
1050008 ILMN Controls   ILMN_Controls   ERCC-00009  ILMN_333584 ERCC-00009  ERCC-00009
5260356 ILMN Controls   ILMN_Controls   ERCC-00053  ILMN_333628 ERCC-00053  ERCC-00053
3310747 ILMN Controls   ILMN_Controls   ERCC-00144  ILMN_333719 ERCC-00144  ERCC-00144
5910451 ILMN Controls   ILMN_Controls   ERCC-00003  ILMN_333578 ERCC-00003  ERCC-00003
1710435 ILMN Controls   ILMN_Controls   ERCC-00138  ILMN_333713 ERCC-00138  ERCC-00138
1400612 ILMN Controls   ILMN_Controls   ERCC-00084  ILMN_333659 ERCC-00084  ERCC-00084
730156  ILMN Controls   ILMN_Controls   ERCC-00017  ILMN_333592 ERCC-00017  ERCC-00017

I would like the output file to look like this:

out.txt
1050008 5.156725968 8.404038296 124.9198605 3.23E-21    2.33E-17    38.57865782 1050008 ILMN Controls   ILMN_Controls   ERCC-00009  ILMN_333584 ERCC-00009  ERCC-00009
3310747 5.631470026 8.581936875 124.6039122 3.34E-21    2.33E-17    38.55204806 3310747 ILMN Controls   ILMN_Controls   ERCC-00144  ILMN_333719 ERCC-00144  ERCC-00144
5910451 4.900364671 8.455329195 124.5720603 3.35E-21    2.33E-17    38.54935989 5910451 ILMN Controls   ILMN_Controls   ERCC-00003  ILMN_333578 ERCC-00003  ERCC-00003
730156  5.565210738 8.48792701  122.2168789 4.28E-21    2.33E-17    38.34773989 730156  ILMN Controls   ILMN_Controls   ERCC-00017  ILMN_333592 ERCC-00017  ERCC-00017

The files are tab delimited and have missing values in some columns.

There is 31 columns in file2.txt and >47000 lines and I'm trying to do this in bash (OSX)

If you have a solution I would greatly appreciate if you could briefly explainn the steps as I'm very new to this.

解决方案

awk 'BEGIN {
  FS = OFS = "\t"
  }
NR == FNR {
  # while reading the 1st file
  # store its records in the array f
  f[$1] = $0
  next
  }
$1 in f {
  # when match is found
  # print all values
  print f[$1], $0
  }' file1 file2 

这篇关于匹配两个文件的第一列中的值,并将匹配行加入到新文件中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆