识别AWK中的重叠范围 [英] Identify overlapping ranges in AWK

查看:62
本文介绍了识别AWK中的重叠范围的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文件,该文件的行分为3列(制表符分隔),例如:

I have a file with rows of 3 columns (tab separated) eg:

2 45 100

第二个文件具有3列的行(制表符分隔),例如:

And a second file with rows of 3 columns (tab separated) eg:

2 10 200

我想要一个awk命令,如果两个文件中的$ 1都匹配并且文件一中$ 2- $ 3之间的范围与文件2中$ 2- $ 3中的范围相交,则匹配行.在文件2中,或者文件2中的范围可以在文件1中的范围内,或者也可以只是部分重叠.范围之间的任何相交都将算作匹配项,然后在文件3中打印该行.

I want an awk command that matched the lines if $1 in both files matches and the range between $2-$3 in file one interstects at all with the range in $2-$3 in file 2. It can be within the range of values in file 2 or the range in file 2 can be within the range in file 1, or theer can just be a partial overlap. Any kind of intersect between the ranges would count as a match and then print the row in file 3.

我当前的代码仅在$ 1和$ 2或$ 3匹配时才匹配,但是在范围彼此之间时不起作用,因为在这些情况下精确数字不匹配.

My current code only matches if $1 and either $2 or $3 match, but doesn't work for when the ranges are within each other as in these cases the precise numbers don't match.

  awk '
        BEGIN {
            FS = "\t";
        }
        FILENAME == ARGV[1] {
            pair[ $1, $2, $3 ] = 1;
            next;
        }
        {
            if ( pair[ $1, $2, $3 ] == 1 ) {
                print $1 $2 $3;
            }
        }

示例输入:

文件1:

1 10 23
2 30 50
6 100 110
8 20 25

文件2:

1 5 15
10 30 50
2 10 100
8 22 24

此处第1行(文件1)与第1行(文件2)匹配,因为第一列与AND范围匹配,两个范围之间的10-15重叠 第2行(file1)与第3行(file2)匹配,因为第一列匹配并且30-50的范围在10-100的范围内. 第4行(file1)与第4行(file2)匹配,因为第一列匹配,并且两者之间的范围22-24重叠. 因此,输出将是在新输出文件中打印的file2的第1,2和4行.

Here line 1(file1) matches line 1(file2) because the first column matches AND range 10-15 overlaps between both ranges Line 2 (file1) matches line 3(file2) because first column matches and range of 30-50 is within range 10-100. Line 4(file1) matches line 4(file2) because first column matches and the range 22-24 overlaps in both. Therefore output would be lines 1,2 and 4 from file2 printed in a new output file.

希望这些示例有所帮助.

Hope these examples help.

非常感谢您的帮助.

提前谢谢!

推荐答案

如果使用join命令通过其第一个字段($ 1)合并两个文件,则非常简单:

It is quite easy if you use join command to merge both files by its first field ($1):

如果只希望将file2行作为输出:

If you only want the file2 lines as output:

join --nocheck-order <(sort -n file1) <(sort -n file2) | awk '{if ($2 >= $4 && $2 <= $5 || $3 >= $4 && $3 <= $5 || $4 >= $2 && $4 <= $3 || $5 >= $2 && $5 <= $3) {print $1" "$4" "$5;}}' -

使用您的输入文件,我得到以下输出:

Using your input files I got this output:

1 5 15
2 10 100
8 22 24

这篇关于识别AWK中的重叠范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆