排序基于在另一个文件中的列的文件 [英] sort a file based on a column in another file
问题描述
我有两个文件无论在格式为:
LOC1 NUM1 NUM2
LOC2 NUM3 num4
第一列是位置,我想使用的位置的顺序中的第一个文件进行排序第二个文件,这样我就可以把两个文件一起,其中的数字是正确的位置。
我可以写一个perl脚本来做到这一点,但我觉得有可能是一些快速/简单的壳/ awk命令来实现这一目标。你有什么想法?
感谢。
编辑:
下面是输入,现在我居然想用第2列文件1排序文件2。
文件1:
GID位置名称GWEIGHT C1SI M1CO M1SI C1LY M1LY C1CO C1LI M1LI
AID ARRY2X ARRY1X ARRY3X ARRY4X ARRY5X ARRY0X ARRY6X ARRY7X
EWEIGHT 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
GENE735X chr17:66199278-66199496 chr17:66199278-66199496 1.000000 0.211785 -0.853890 1.071875 0.544136 0.703871 0.371880 0.218960 -2.268618
GENE1562X chr10:80097054-80097298 chr10:80097054-80097298 1.000000 0.533673 -0.397202 0.783363 0.109824 -0.436342 0.158667 0.475748 -1.227730
GENE6579X chr19:23694188-23694395 chr19:23694188-23694395 1.000000 0.127748 -0.203827 0.846738 0.045599 -0.211767 0.415442 0.282123 -1.302055
文件2:
GID位置名称GWEIGHT C1SI M1CO M1SI C1LY M1LY C1CO C1LI M1LI
AID ARRY2X ARRY1X ARRY3X ARRY4X ARRY5X ARRY0X ARRY6X ARRY7X
EWEIGHT 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
GENE6579X chr19:23694188-23694395 chr19:23694188-23694395 1.000000 0.127748 -0.203827 0.846738 0.045599 -0.211767 0.415442 0.282123 -1.302055
GENE735X chr17:66199278-66199496 chr17:66199278-66199496 1.000000 0.211785 -0.853890 1.071875 0.544136 0.703871 0.371880 0.218960 -2.268618
GENE1562X chr10:80097054-80097298 chr10:80097054-80097298 1.000000 0.533673 -0.397202 0.783363 0.109824 -0.436342 0.158667 0.475748 -1.227730
这是awk的解决方案:存储在内存中的第二个文件,然后在第一个文件循环,从第2个文件发射匹配行:
的awk'FNR == NR {X2 [$ 1] = $ 0;在X2 {打印X2 [$ 1]}}旁边$ 1'第二个第一
实施@ Barmar的评论
加入-1 2 -o1.1 1.2 2.2 2.3≤(猫-n第一|排序-k2)≤(排序第二)|
排序-n |
切-d''-f 2-
请注意其它回答者,我对这些文件进行测试:
$猫第一
富X Y
酒吧X Y
巴兹X Y
$猫第二
酒吧X1 Y1
巴兹X2 Y2
富X3 Y3
说明
的awk'FNR == NR {X2 [$ 1] = $ 0;在X2 {打印X2 [$ 1]}}旁边$ 1'第二个第一
这部分内容在命令行paramters 1号文件(在这里,第二):
FNR == NR {X2 [$ 1] = $ 0;下一个}
状况 FNR == NR
将只对第一个命名的文件是真实的。 FNR
是awk的文件记录号变量, NR
是所有输入源的当前记录号。目前的行存储在 X2
由记录的第一个字段名为索引关联数组(不是一个伟大的变量名)。
接下来的情况下, $ 1×2
,只会文件后,启动第二已经完全读出。它看起来在文件中的行命名为第一的第一场,动作从打印文件的第二对应的线路,已被存储在数组中为止。
请注意,该文件的awk命令的顺序是非常重要的。既然你控制基于一个名为第一的文件的输出,它必须是的最后的文件通过awk的处理。
I have two files both in the format of:
loc1 num1 num2
loc2 num3 num4
The first column is the location and I want to use the order of the locations in the first file to sort the second file so that I can put the two files together where the numbers are right for the location.
I can write a perl script to do this but I felt there might be some quick/easy shell/awk command to achieve this. Do you have any ideas?
Thanks.
Edits:
Here is the input, now I actually want to use column 2 in file 1 to sort file2.
File1:
GID location NAME GWEIGHT C1SI M1CO M1SI C1LY M1LY C1CO C1LI M1LI
AID ARRY2X ARRY1X ARRY3X ARRY4X ARRY5X ARRY0X ARRY6X ARRY7X
EWEIGHT 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
GENE735X chr17:66199278-66199496 chr17:66199278-66199496 1.000000 0.211785 -0.853890 1.071875 0.544136 0.703871 0.371880 0.218960 -2.268618
GENE1562X chr10:80097054-80097298 chr10:80097054-80097298 1.000000 0.533673 -0.397202 0.783363 0.109824 -0.436342 0.158667 0.475748 -1.227730
GENE6579X chr19:23694188-23694395 chr19:23694188-23694395 1.000000 0.127748 -0.203827 0.846738 0.045599 -0.211767 0.415442 0.282123 -1.302055
File 2:
GID location NAME GWEIGHT C1SI M1CO M1SI C1LY M1LY C1CO C1LI M1LI
AID ARRY2X ARRY1X ARRY3X ARRY4X ARRY5X ARRY0X ARRY6X ARRY7X
EWEIGHT 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
GENE6579X chr19:23694188-23694395 chr19:23694188-23694395 1.000000 0.127748 -0.203827 0.846738 0.045599 -0.211767 0.415442 0.282123 -1.302055
GENE735X chr17:66199278-66199496 chr17:66199278-66199496 1.000000 0.211785 -0.853890 1.071875 0.544136 0.703871 0.371880 0.218960 -2.268618
GENE1562X chr10:80097054-80097298 chr10:80097054-80097298 1.000000 0.533673 -0.397202 0.783363 0.109824 -0.436342 0.158667 0.475748 -1.227730
An awk solution: store the 2nd file in memory, then loop over the first file, emitting matching lines from the 2nd file:
awk 'FNR==NR {x2[$1] = $0; next} $1 in x2 {print x2[$1]}' second first
Implementing @Barmar's comment
join -1 2 -o "1.1 1.2 2.2 2.3" <(cat -n first | sort -k2) <(sort second) |
sort -n |
cut -d ' ' -f 2-
note to other answerers, I tested with these files:
$ cat first
foo x y
bar x y
baz x y
$ cat second
bar x1 y1
baz x2 y2
foo x3 y3
Explanation of
awk 'FNR==NR {x2[$1] = $0; next} $1 in x2 {print x2[$1]}' second first
This part reads the 1st file in the command line paramters (here, "second"):
FNR==NR {x2[$1] = $0; next}
The condition FNR == NR
will be true only for the first named file. FNR
is awk's "File Record Number" variable, NR
is the current record number from all input sources. The current line is stored in an associative array named x2
(not a great variable name) indexed by the first field of the record.
The next condition, $1 in x2
, will only start after the file "second" has been completely read. It will look at the first field of the line in file named "first", and the action prints the corresponding line from file "second", which has been stored in the array.
Note that the order of the files in the awk command is important. Since you control the output based on the file named "first", it must be the last file processed by awk.
这篇关于排序基于在另一个文件中的列的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!