排序基于在另一个文件中的列的文件 [英] sort a file based on a column in another file

查看:101
本文介绍了排序基于在另一个文件中的列的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个文件无论在格式为:

  LOC1 NUM1 NUM2
LOC2 NUM3 num4

第一列是位置,我想使用的位置的顺序中的第一个文件进行排序第二个文件,这样我就可以把两个文件一起,其中的数字是正确的位置。

我可以写一个perl脚本来做到这一点,但我觉得有可能是一些快速/简单的壳/ awk命令来实现这一目标。你有什么想法?

感谢。

编辑:

下面是输入,现在我居然想用第2列文件1排序文件2。

文件1:

  GID位置名称GWEIGHT C1SI M1CO M1SI C1LY M1LY C1CO C1LI M1LI
AID ARRY2X ARRY1X ARRY3X ARRY4X ARRY5X ARRY0X ARRY6X ARRY7X
EWEIGHT 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
GENE735X chr17:66199278-66199496 chr17:66199278-66199496 1.000000 0.211785 -0.853890 1.071875 0.544136 0.703871 0.371880 0.218960 -2.268618
GENE1562X chr10:80097054-80097298 chr10:80097054-80097298 1.000000 0.533673 -0.397202 0.783363 0.109824 -0.436342 0.158667 0.475748 -1.227730
GENE6579X chr19:23694188-23694395 chr19:23694188-23694395 1.000000 0.127748 -0.203827 0.846738 0.045599 -0.211767 0.415442 0.282123 -1.302055

文件2:

  GID位置名称GWEIGHT C1SI M1CO M1SI C1LY M1LY C1CO C1LI M1LI
AID ARRY2X ARRY1X ARRY3X ARRY4X ARRY5X ARRY0X ARRY6X ARRY7X
EWEIGHT 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
GENE6579X chr19:23694188-23694395 chr19:23694188-23694395 1.000000 0.127748 -0.203827 0.846738 0.045599 -0.211767 0.415442 0.282123 -1.302055
GENE735X chr17:66199278-66199496 chr17:66199278-66199496 1.000000 0.211785 -0.853890 1.071875 0.544136 0.703871 0.371880 0.218960 -2.268618
GENE1562X chr10:80097054-80097298 chr10:80097054-80097298 1.000000 0.533673 -0.397202 0.783363 0.109824 -0.436342 0.158667 0.475748 -1.227730


解决方案

这是awk的解决方案:存储在内存中的第二个文件,然后在第一个文件循环,从第2个文件发射匹配行:

 的awk'FNR == NR {X2 [$ 1] = $ 0;在X2 {打印X2 [$ 1]}}旁边$ 1'第二个第一

实施@ Barmar的评论

 加入-1 2 -o1.1 1.2 2.2 2.3≤(猫-n第一|排序-k2)≤(排序第二)|
排序-n |
切-d''-f 2-


请注意其它回答者,我对这些文件进行测试:

  $猫第一
富X Y
酒吧X Y
巴兹X Y
$猫第二
酒吧X1 Y1
巴兹X2 Y2
富X3 Y3


说明

 的awk'FNR == NR {X2 [$ 1] = $ 0;在X2 {打印X2 [$ 1]}}旁边$ 1'第二个第一

这部分内容在命令行paramters 1号文件(在这里,第二):

  FNR == NR {X2 [$ 1] = $ 0;下一个}

状况 FNR == NR 将只对第一个命名的文件是真实的。 FNR 是awk的文件记录号变量, NR 是所有输入源的当前记录号。目前的行存储在 X2 由记录的第一个字段名为索引关联数组(不是一个伟大的变量名)。

接下来的情况下, $ 1×2 ,只会文件后,启动第二已经完全读出。它看起来在文件中的行命名为第一的第一场,动作从打印文件的第二对应的线路,已被存储在数组中为止。

请注意,该文件的awk命令的顺序是非常重要的。既然你控制基于一个名为第一的文件的输出,它必须是的最后的文件通过awk的处理。

I have two files both in the format of:

loc1 num1 num2
loc2 num3 num4

The first column is the location and I want to use the order of the locations in the first file to sort the second file so that I can put the two files together where the numbers are right for the location.

I can write a perl script to do this but I felt there might be some quick/easy shell/awk command to achieve this. Do you have any ideas?

Thanks.

Edits:

Here is the input, now I actually want to use column 2 in file 1 to sort file2.

File1:

GID     location        NAME    GWEIGHT C1SI    M1CO    M1SI    C1LY    M1LY    C1CO    C1LI    M1LI
AID                             ARRY2X  ARRY1X  ARRY3X  ARRY4X  ARRY5X  ARRY0X  ARRY6X  ARRY7X
EWEIGHT                         1.000000        1.000000        1.000000        1.000000        1.000000        1.000000        1.000000        1.000000
GENE735X        chr17:66199278-66199496 chr17:66199278-66199496 1.000000        0.211785        -0.853890       1.071875        0.544136        0.703871     0.371880 0.218960        -2.268618
GENE1562X       chr10:80097054-80097298 chr10:80097054-80097298 1.000000        0.533673        -0.397202       0.783363        0.109824        -0.436342    0.158667 0.475748        -1.227730
GENE6579X       chr19:23694188-23694395 chr19:23694188-23694395 1.000000        0.127748        -0.203827       0.846738        0.045599        -0.211767    0.415442 0.282123        -1.302055

File 2:

GID     location        NAME    GWEIGHT C1SI    M1CO    M1SI    C1LY    M1LY    C1CO    C1LI    M1LI
AID                             ARRY2X  ARRY1X  ARRY3X  ARRY4X  ARRY5X  ARRY0X  ARRY6X  ARRY7X
EWEIGHT                         1.000000        1.000000        1.000000        1.000000        1.000000        1.000000        1.000000        1.000000
GENE6579X       chr19:23694188-23694395 chr19:23694188-23694395 1.000000        0.127748        -0.203827       0.846738        0.045599        -0.211767    0.415442 0.282123        -1.302055
GENE735X        chr17:66199278-66199496 chr17:66199278-66199496 1.000000        0.211785        -0.853890       1.071875        0.544136        0.703871     0.371880 0.218960        -2.268618
GENE1562X       chr10:80097054-80097298 chr10:80097054-80097298 1.000000        0.533673        -0.397202       0.783363        0.109824        -0.436342    0.158667 0.475748        -1.227730

解决方案

An awk solution: store the 2nd file in memory, then loop over the first file, emitting matching lines from the 2nd file:

awk 'FNR==NR {x2[$1] = $0; next} $1 in x2 {print x2[$1]}' second first

Implementing @Barmar's comment

join -1 2 -o "1.1 1.2 2.2 2.3" <(cat -n first | sort -k2) <(sort second) | 
sort -n | 
cut -d ' ' -f 2-


note to other answerers, I tested with these files:

$ cat first
foo x y
bar x y
baz x y
$ cat second
bar x1 y1
baz x2 y2
foo x3 y3


Explanation of

awk 'FNR==NR {x2[$1] = $0; next} $1 in x2 {print x2[$1]}' second first

This part reads the 1st file in the command line paramters (here, "second"):

FNR==NR {x2[$1] = $0; next}

The condition FNR == NR will be true only for the first named file. FNR is awk's "File Record Number" variable, NR is the current record number from all input sources. The current line is stored in an associative array named x2 (not a great variable name) indexed by the first field of the record.

The next condition, $1 in x2, will only start after the file "second" has been completely read. It will look at the first field of the line in file named "first", and the action prints the corresponding line from file "second", which has been stored in the array.

Note that the order of the files in the awk command is important. Since you control the output based on the file named "first", it must be the last file processed by awk.

这篇关于排序基于在另一个文件中的列的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆