在另一个文件中查找文件的模式,并打印出后者的相应字段,以保持顺序 [英] Find patterns of a file in another file and print out a corresponding field of the latter maintaining the order
问题描述
我已经尝试了一段时间以解决此问题,并且检查了许多帖子(例如,此处 awk在另一个文件中搜索一个字段)而没有真正找到我想要的东西.我需要使用sed,grep,awk(没有python,R ...)之类的bash工具解决方案
I've been trying for a while to solve this problem and I checked many posts (for example here Print lines in one file matching patterns in another file or here awk search for a field in another file) without really finding what I am looking for. I need the solution with bash tools like sed, grep, awk (no python, R,...)
我有两个文件(比那些大得多):
I have two files (much bigger than those):
文件1:
2 891299 0.50923964E-02 1248 4.713 1349.08
3 245857 0.57915542E-02 1335 4.671 1369.65
文件2:
278 2645 2334659 0.75142 0.53123
279 2643 245857 0.80439 0.56868
500 1341 830677 0.74922 0.52958
501 1339 882791 0.87685 0.61980
502 1337 891299 0.63224 0.44680
在此示例中,我想在file2的第3列中找到file1的第2列中的模式,并打印出file2的第1列,并保持file1给出的顺序.
In this example I want to find the pattern in column 2 of file1 in column 3 of file2 and print column 1 of the latter, for all the lines of file1 and maintaining the order given by file1.
一个可能的解决方案(我知道这不是没有错误的)是以下令人无法接受的慢bash循环:
A possible solution (I am aware is not bug free) is the following unacceptably slow bash loop:
for i in `awk '{print $2}' file1` ; do grep " $i " file2 | awk '{print $1}' ; done
打印到屏幕上的
502
279
请注意,像这样的简单"解决方案:
Please note that a 'simple' solution like:
awk 'NR==FNR{pats[$2]; next} $3 in pats' file1 file2
不适当,因为打印顺序是由file2而不是file1给出的(即先打印到屏幕279,然后是502).
is not appropriate as the order of the printing is given by file2 and not by file1 (i.e. it prints to screen first 279 and then 502).
非常感谢您的帮助.
推荐答案
您可以撤消要在awk中处理的文件并获得正确的输出:
You can reverse files to be processed in awk and get the right output:
awk 'NR==FNR{pats[$3]=$1; next} $2 in pats{print pats[$2]}' file2 file1
502
279
这篇关于在另一个文件中查找文件的模式,并打印出后者的相应字段,以保持顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!