仅使用匹配项,如果列值在文件B中的其他两个列值之间,则打印文件A行 [英] Using matching entries only, print file A line if column values is between two other columns values in file B

查看:76
本文介绍了仅使用匹配项,如果列值在文件B中的其他两个列值之间,则打印文件A行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个制表符delim文件1

I have a tab delim file1

A 1
A 20
B 17
B 33
C 10
C 20
E 7

和另一个标签delim文件2

and another tab delim file2

A 1  5
A 6  20
B 1  10
B 30 60
C 10 20
E 1  6

我需要打印file1中col1 file1 = col1 file2并且col2 file1中的值在file2 cols 2和3范围内的行.

I need to print the lines in file1 for which col1 file1 = col1 file2 and value in col2 file1 falls within the ranges in cols 2 and 3 of file2.

输出看起来像

A 1
A 20
B 33
C 10
C 20

我正在尝试

awk 'FNR==NR{a[$1]=$2;next}; ($1) in a{if($2=(a[$1] >= $2 && a[$1] <=$3) {print}}1'  file1  file2 

但是它不起作用.

推荐答案

要存储多个范围,您确实要使用数组数组或列表. awk不直接支持它们,但是可以对其进行仿真.在这种情况下,数组的数组似乎更有效率.

To store multiple ranges, you really want to use arrays of arrays or lists. awk doesn't support them directly but they can be emulated. In this case arrays of arrays seem likely to be more efficient.

awk '
    # store each range from file2
    FNR==NR {
        n = ++q[$1]
        min[$1 FS n] = $2
        max[$1 FS n] = $3
        next
    }

    # process file1
    n = q[$1] { # if no q entry, line cannot be in range
        for (i=1; i<=n; i++)
            if ( min[$1 FS i]<=$2 && $2<=max[$1 FS i]) {
                print
                next
            }
    }
' file2 file1

每个最小/最大范围需要单独存储.通过维护col1($1)的每个不同值的出现次数的计数器(q[$1]),我们确保创建了不同的新数组元素[$1 FS n].

Each min/max range needs to be stored separately. By maintaining a counter (q[$1]) of occurrences of each different value of col1 ($1), we ensure creation of a distinct new array element [$1 FS n].

随后,当检查范围时,我们知道col1的任何特定值恰好发生了q[$1]次.

Subsequently, when checking the ranges, we know that any particular value of col1 occurs precisely q[$1] times.

这篇关于仅使用匹配项,如果列值在文件B中的其他两个列值之间,则打印文件A行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆