两个文件之间的awk匹配时相交的区域（任何解决方案，欢迎） [英] Awk matching between two files when regions intersect (any solutions welcome)

查看：328 发布时间：2016/7/28 16:53:33 text awk filtering intersect

本文介绍了两个文件之间的awk匹配时相交的区域（任何解决方案，欢迎）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是建立在前面一个问题<一期建设href=\"http://stackoverflow.com/questions/12727108/awk-conditional-filter-one-file-based-on-another-or-other-solutions\">Awk基于另一个（或其他解决方案）有条件的过滤器一个文件

是在问题的底部快速汇总

我有一个从行，如果值该行比赛2另一个文本文件输出列在一个文本文件refGene.txt出3个值的awk程序。

我需要包括一个额外的标准，找到两个文件之间的匹配。的标准是列入如果在文件1的重叠用的两个值中refGene.txt一个行的范围的每一行中所指定的2位数字的值的范围。在文件1线的一个例子：

  10 CHR1 20
CHR2 10 20

和文件2（refGene.txt）匹配列（$ 3 $ 5，$ 6）的一个示例行：

  CHR1 5月30日

目前因为虽然第一列匹配，无论是第二或第三列做没有awk程序不把这当作一场比赛。但我想办法把它当作一个比赛，因为文件1的区域10-20距离5-30在refGene.txt的范围内。然而，在文件1中的第二行不应匹配，因为第一列不匹配，这是必要的。如果有一种方法，包括情况下，当任何文件1的范围内与任何这将是非常有帮助的refGene.txt范围的重叠（因此部分重叠也算作一个匹配）。
它也应更换以下条件语句，因为它也将发现目前下面描述的所有情况下

所以总结的：
希望awk将打印的比赛，如果：
$ 1文件1文件2和火柴$ 3：
的$ 2- $ 3文件1的范围内相交于所有的$ 5 $ 6 file2的范围

请让我知道如果我的问题是不清楚。任何帮助真的是AP preciated，感谢它前进！（解决方案并不一定要在AWK）

Rubal

  FILES = /文件/ * TXT
在$ FILES F;
做    AWK
        开始 {
            FS =\\ t的;
        }
        FILENAME == ARGV [1] {
            对[$ 1，$ 2，$ 3] = 1;
            下一个;
        }
        {
            如果（对[$ 3 $ 5，$ 6] == 1）{
                打印$ 13;
            }
        }
    '$（$基本名F）/files/refGene.txt＆GT; /文件/结果/ $（基名$ F）;
DONE

解决方案

您只需要使用2个数组：

 的awk -F'\\ t''
  NR == FNR {分钟[$ 1] = $ 2;最大[$ 1] = $ 3;下一个}
  （以分钟$ 3）和放大器;＆安培; （分[$ 3]＆GT = $ 5）及与放大器; （最大值[$ 3]＆LT; = $ 6）{打印$ 13}

NR == FNR 只是另一种方式来写文件名== ARGV [1] - 它看起来在行号，而不是文件名。

This is building upon an earlier question Awk conditional filter one file based on another (or other solutions)

Quick summary at bottom of question

I have an awk program that outputs a column from rows in a text file 'refGene.txt if values in that row match 2 out of 3 values in another text file.

I need to include an additional criteria for finding a match between the two files. The criteria is inclusion if the range of the 2 numberical values specified in each row in file 1 overlap with the range of the two values in a row in refGene.txt. An example of a line in File 1:

chr1 10 20
chr2 10 20

and an example line in file 2(refGene.txt) of the matching columns ($3, $5, $ 6):

chr1 5 30

Currently the awk program does not treat this as a match because although the first column matches neither the 2nd or 3rd columns do no. But I would like a way to treat this as a match because the region 10-20 in file 1 is WITHIN the range of 5-30 in refGene.txt. However the second line in file 1 should NOT match because the first column does not match, which is necessary. If there is a way to include cases when any of the range in file 1 overlaps with any of the range in refGene.txt that would be really helpful (so partial overlap is also counted as a match). It should also replace the below conditional statements as it would also find all the cases currently described below.

So a summary: Want awk to print a match if: $1 in file1 matches $3 in file 2 AND: The range of $2-$3 in file1 intersects at all with the range of $5-$6 in file2

Please let me know if my question is unclear. Any help is really appreciated, thanks it advance! (solutions do not have to be in awk)

Rubal

FILES=/files/*txt   
for f in $FILES ;
do

    awk '
        BEGIN {
            FS = "\t";
        }
        FILENAME == ARGV[1] {
            pair[ $1, $2, $3 ] = 1;
            next;
        }
        {
            if ( pair[ $3, $5, $6 ] == 1 ) {
                print $13;
            }
        }
    ' $(basename $f) /files/refGene.txt > /files/results/$(basename $f) ;
done

解决方案

You just need to use 2 arrays:

awk -F '\t' '
  NR == FNR {min[$1] = $2; max[$1] = $3; next}
  ($3 in min) && (min[$3] >= $5) && (max[$3] <= $6) {print $13}
'

NR==FNR is just another way to write FILENAME == ARGV[1] -- it looks at line numbers instead of filenames.

这篇关于两个文件之间的awk匹配时相交的区域（任何解决方案，欢迎）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

两个文件之间的awk匹配时相交的区域（任何解决方案，欢迎） [英] Awk matching between two files when regions intersect (any solutions welcome)

问题描述

相关文章

Linux/Unix最新文章

热门教程

热门工具

登录关闭

两个文件之间的awk匹配时相交的区域（任何解决方案，欢迎） [英] Awk matching between two files when regions intersect (any solutions welcome)

问题描述

相关文章

Linux/Unix最新文章

热门教程

热门工具

登录 关闭

登录关闭