识别AWK中的重叠范围 [英] Identify overlapping ranges in AWK

查看：62 发布时间：2020/5/6 9:33:42 file text awk comparison match

本文介绍了识别AWK中的重叠范围的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个文件，该文件的行分为3列(制表符分隔)，例如:

I have a file with rows of 3 columns (tab separated) eg:

2 45 100

第二个文件具有3列的行(制表符分隔)，例如:

And a second file with rows of 3 columns (tab separated) eg:

2 10 200

我想要一个awk命令，如果两个文件中的$ 1都匹配并且文件一中$ 2- $ 3之间的范围与文件2中$ 2- $ 3中的范围相交，则匹配行.在文件2中，或者文件2中的范围可以在文件1中的范围内，或者也可以只是部分重叠.范围之间的任何相交都将算作匹配项，然后在文件3中打印该行.

I want an awk command that matched the lines if $1 in both files matches and the range between $2-$3 in file one interstects at all with the range in $2-$3 in file 2. It can be within the range of values in file 2 or the range in file 2 can be within the range in file 1, or theer can just be a partial overlap. Any kind of intersect between the ranges would count as a match and then print the row in file 3.

我当前的代码仅在$ 1和$ 2或$ 3匹配时才匹配，但是在范围彼此之间时不起作用，因为在这些情况下精确数字不匹配.

My current code only matches if $1 and either $2 or $3 match, but doesn't work for when the ranges are within each other as in these cases the precise numbers don't match.

  awk '
        BEGIN {
            FS = "\t";
        }
        FILENAME == ARGV[1] {
            pair[ $1, $2, $3 ] = 1;
            next;
        }
        {
            if ( pair[ $1, $2, $3 ] == 1 ) {
                print $1 $2 $3;
            }
        }

示例输入:

文件1:

文件2:

此处第1行(文件1)与第1行(文件2)匹配，因为第一列与AND范围匹配，两个范围之间的10-15重叠第2行(file1)与第3行(file2)匹配，因为第一列匹配并且30-50的范围在10-100的范围内. 第4行(file1)与第4行(file2)匹配，因为第一列匹配，并且两者之间的范围22-24重叠. 因此，输出将是在新输出文件中打印的file2的第1,2和4行.

Here line 1(file1) matches line 1(file2) because the first column matches AND range 10-15 overlaps between both ranges Line 2 (file1) matches line 3(file2) because first column matches and range of 30-50 is within range 10-100. Line 4(file1) matches line 4(file2) because first column matches and the range 22-24 overlaps in both. Therefore output would be lines 1,2 and 4 from file2 printed in a new output file.

希望这些示例有所帮助.

Hope these examples help.

非常感谢您的帮助.

提前谢谢！

推荐答案

如果使用join命令通过其第一个字段($ 1)合并两个文件，则非常简单:

It is quite easy if you use join command to merge both files by its first field ($1):

如果只希望将file2行作为输出:

If you only want the file2 lines as output:

join --nocheck-order <(sort -n file1) <(sort -n file2) | awk '{if ($2 >= $4 && $2 <= $5 || $3 >= $4 && $3 <= $5 || $4 >= $2 && $4 <= $3 || $5 >= $2 && $5 <= $3) {print $1" "$4" "$5;}}' -

使用您的输入文件，我得到以下输出:

Using your input files I got this output:

1 5 15
2 10 100
8 22 24

这篇关于识别AWK中的重叠范围的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

识别AWK中的重叠范围 [英] Identify overlapping ranges in AWK

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

识别AWK中的重叠范围 [英] Identify overlapping ranges in AWK

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭