带变量的AWK负正则表达式 [英] AWK negative regular expression with variable

查看:66
本文介绍了带变量的AWK负正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在bash脚本中使用awk比较两个文件,以获得不匹配的行.我需要将第二个文件的所有三个字段(作为一个模式?)与第一个文件的所有行进行比较:

I am using awk in a bash script to compare two files to get just the not-matching lines. I need to compare all three fields of the second file (as one pattern?) with all lines of the first file:

第一个文件:

chr1    9997    10330   HumanGM18558_peak_1     150     .       10.78887        18.86368        15.08777        100
chr1    628885  635117  HumanGM18558_peak_2     2509    .       83.77238        255.95094       250.99944       5270
chr1    15966215        15966638        HumanGM18558_peak_3    81      .       7.61567 11.78841        8.17169 200

第二个文件:

chr1 628885 635117
chr1 1250086 1250413
chr1 16613629 16613934
chr1 16644496 16644800
chr1 16895871 16896489
chr1 16905126 16905616

目前的想法是将一个文件加载到数组中,并使用AWK的负正则表达式进行比较.

The current idea is to load one file in an array and use AWKs negative regular expression to compare.

readarray a < file2.txt
for i in "${a[@]}"; do
awk -v var="$i" '!/var/' file1.narrowPeak | cat > output.narrowPeak
done

问题在于'!/var/'无法使用变量.

The problem is that '!/var/' is not working with variables.

推荐答案

仅使用 awk :

$ awk 'NR==FNR{a[$1,$2,$3]; next} !(($1,$2,$3) in a)' file2 file1
chr1    9997    10330   HumanGM18558_peak_1     150     .       10.78887        18.86368        15.08777        100
chr1    15966215        15966638        HumanGM18558_peak_3    81      .       7.61567 11.78841        8.17169 200

  • NR == FNR 这仅适用于第一个文件,在本示例中为 file2
  • a [$ 1,$ 2,$ 3] 根据前三个字段创建键,如果两个文件之间的间距完全相同,则可以简单地使用 $ 0 $ 1,$ 2,$ 3
  • next 跳过其余命令并处理下一行输入
  • ($ 1,$ 2,$ 3)在一个中,以检查 file1 的前三个字段是否作为键出现在数组 a 中.然后反转条件.
    • NR==FNR this will be true only for the first file, which is file2 in this example
    • a[$1,$2,$3] create keys based on first three fields, if spacing is exactly same between the two files, you can simply use $0 instead of $1,$2,$3
    • next to skip remaining commands and process next line of input
    • ($1,$2,$3) in a to check if first three fields of file1 is present as key in array a. Then invert the condition.
    • 这是另一种编写方法(感谢Ed Morton)

      Here's another way to write it (thanks to Ed Morton)

      awk '{key=$1 FS $2 FS $3} NR==FNR{a[key]; next} !(key in a)' file2 file1
      

      这篇关于带变量的AWK负正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆