awk将file1与file2匹配并输出匹配项 [英] awk to match file1 with file2 and output matches

查看:104
本文介绍了awk将file1与file2匹配并输出匹配项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用awk将file1与file2匹配,并在单独的文件中打印匹配的行. File1为〜4MB,出现以下错误,我似乎无法修复.谢谢:).

I am trying to use awk to match file1 with file2 and print the lines that match in a separate file. File1 is ~4MB and I am getting the below error and I can not seem to fix it. Thank you :).

awk 'NR==FNR{c[$0]; next} ($0 in c)' RS="," file1.txt RS="\n" file2.txt > match.txt

awk:超出程序限制:最大字段数大小= 32767 FILENAME ="sort.2.txt" FNR = 1 NR = 1

awk: program limit exceeded: maximum number of fields size=32767 FILENAME="sort.2.txt" FNR=1 NR=1

文件1

chr1:3063265-3063458 AVP:exon.3 8.55959
chr1:947806-947967 RSPO4:exon.3 246.54
chr2:12758246-12758422 CTD-2192J16.22:exon.2;MAN2B1:exon.1;MAN2B1:exon.20;MAN2B1:exon.22 221.483
chr2:57975642-57975745 KIF5A:exon.1;KIF5A:exon.23;KIF5A:exon.26 222.932

文件2

AVP
KIF5A

所需的输出

chr1:3063265-3063458 AVP:exon.3 8.55959
chr2:57975642-57975745 KIF5A:exon.1;KIF5A:exon.23;KIF5A:exon.26 222.932

推荐答案

您可以尝试

awk '
    FNR==NR{d[$0]; next;}          #Store each key to find, from file2
    {                              #for each line in file1
        for(k in d){               #for each key in d (file2)
            pat="(^|;)"k":";       #pattern to search (regular expression)
            if($2 ~ pat){
                print;             #print if match with RE
                break;
            }
        }
    }' file2 file1

你得到


chr1:3063265-3063458 AVP:exon.3 8.55959
chr2:57975642-57975745 KIF5A:exon.1;KIF5A:exon.23;KIF5A:exon.26 222.932

这篇关于awk将file1与file2匹配并输出匹配项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆