Bash:在另一个文件中找到一个文件的模式,并打印出维护顺序的后者的对应字段 [英] Bash: find patterns of a file in another file and print out a corresponding field of the latter maintaining the order
问题描述
我一直在尝试解决这个问题,并检查了很多帖子(例如grep,awk或sed?在一个文件中打印行匹配模式在另一个文件或这里 awk在另一个字段中搜索文件)没有真正找到我在找什么。我需要bash工具,如sed,grep,awk(无python,R,...)的解决方案。
我有两个文件(比这些文件大得多):
file1:
2 891299 0.50923964E-02 1248 4.713 1349.08
3 245857 0.57915542E-02 1335 4.671 1369.65
file2:
278 2645 2334659 0.75142 0.53123
279 2643 245857 0.80439 0.56868
500 1341 830677 0.74922 0.52958
501 1339 882791 0.87685 0.61980
502 1337 891299 0.63224 0.44680
在这个例子中我想找到第2列的模式file1在file2的第3列和第1列的后者,file1的所有行并维护file1给出的顺序。
一个可能的解决方案(我知道是不是错误免费)是以下难以接受的慢bash循环:
for i in'awk'{print $ 2}'file1`;做grep$我file2 | awk'{print $ 1}';完成
打印到屏幕:
502
279
请注意一个'简单'的解决方案,如:
awk'NR == FNR {pats [$ 2];下一步} $ 3在拍打'file1 file2
是不合适的,因为打印顺序由file2而不是通过file1(即它打印到屏幕上第一个279和然后502)。
非常感谢您的帮助。
Marco
您可以反向文件以在awk中处理并获得正确的输出:
awk'NR == FNR {pats [$ 3] = $ 1;下一个} $ pat in {print pats [$ 2]}'file2 file1
502
279
Carissimi,
I've been trying for a while to solve this problem and I checked many posts (for example here grep, awk or sed? Print lines in one file matching patterns in another file or here awk search for a field in another file) without really finding what I am looking for. I need the solution with bash tools like sed, grep, awk (no python, R,...)
I have two files (much bigger than those):
file1:
2 891299 0.50923964E-02 1248 4.713 1349.08
3 245857 0.57915542E-02 1335 4.671 1369.65
file2:
278 2645 2334659 0.75142 0.53123
279 2643 245857 0.80439 0.56868
500 1341 830677 0.74922 0.52958
501 1339 882791 0.87685 0.61980
502 1337 891299 0.63224 0.44680
In this example I want to find the pattern in column 2 of file1 in column 3 of file2 and print column 1 of the latter, for all the lines of file1 and maintaining the order given by file1.
A possible solution (I am aware is not bug free) is the following unacceptably slow bash loop:
for i in `awk '{print $2}' file1` ; do grep " $i " file2 | awk '{print $1}' ; done
which prints to screen:
502
279
Please note that a 'simple' solution like:
awk 'NR==FNR{pats[$2]; next} $3 in pats' file1 file2
is not appropriate as the order of the printing is given by file2 and not by file1 (i.e. it prints to screen first 279 and then 502).
Thanks a lot for your help.
Marco
You can reverse files to be processed in awk and get the right output:
awk 'NR==FNR{pats[$3]=$1; next} $2 in pats{print pats[$2]}' file2 file1
502
279
这篇关于Bash:在另一个文件中找到一个文件的模式,并打印出维护顺序的后者的对应字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!