如何删除数字范围内的重叠(AWK) [英] How to remove overlap in numeric ranges (AWK)

查看:165
本文介绍了如何删除数字范围内的重叠(AWK)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试删除文件中的重叠部分.

I'm trying to remove the overlap within a file.

  • 有一堆记录以"A"开头,并且具有起始值"和结束值".
  • 还有许多以"B"开头的记录,也有范围,并且显示与以"A"开头的记录可能重叠的记录.想法是从A删除重叠范围,以便仅存在非重叠范围.

B中的某些记录具有与A相同的起始值",而其他记录具有与A相同的结束值".因此,如果A的范围为0-100,而B的范围为0-32然后,我的预期输出是: A 33-100和B 0-32.

Some of the records in B have an identical 'start-value' while others have an identical 'end-value' with A. So, if A has a range of 0 - 100 and B has a range of 0 - 32. Then my expected output is: A 33 - 100 and B 0 - 32.

尽管我有很多文件需要进行此操作,但是单个文件很小.

Although I have a lot of files that needs to undergo this operation, the individual files are very small.

这是一个示例文件:

A   0       100
A   101     160 
A   200     300
A   500     1100
A   1200    1300
A   1301    1340
A   1810    2000
B   0       32
B   500     540
B   1250    1300
B   1319    1340
B   1920    2000

预期的样品产量

A   33      100
A   101     160 
A   200     300
A   541     1100
A   1200    1249
A   1301    1318
A   1810    1919
B   0       32
B   500     540
B   1250    1300
B   1319    1340
B   1920    2000

感谢您的所有帮助!

推荐答案

好吧,因为OP确认B 501 540是拼写错误,所以我发表了我的回答:)

Ok, since OP confirmed that the B 501 540 is typo, I post my answer :)

awk -v OFS="\t" '/^A/{s[NR]=$2;e[NR]=$3;l=NR}
/^B/{ 
        for(i=1;i<=l;i++){
                if(s[i]==$2){
                        s[i]=$3+1
                        break
                }else if(e[i]==$3){
                        e[i]=$2-1
                        break
                }
        }
        s[NR] = $2; e[NR]=$3
}
END{for(i=1;i<=NR;i++)print ((i<=l)?"A":"B"),s[i],e[i]}
        ' file

测试您的文件(拼写错误是固定的):

test with your file (the typo was fixed):

kent$  awk -v OFS="\t" '/^A/{s[NR]=$2;e[NR]=$3;l=NR}
/^B/{ 
        for(i=1;i<=l;i++){
                if(s[i]==$2){
                        s[i]=$3+1
                        break
                }else if(e[i]==$3){
                        e[i]=$2-1
                        break
                }
        }
        s[NR] = $2; e[NR]=$3
}
END{for(i=1;i<=NR;i++)print ((i<=l)?"A":"B"),s[i],e[i]}
        ' file
    A       33      100
    A       101     160
    A       200     300
    A       541     1100
    A       1200    1249
    A       1301    1318
    A       1810    1919
    B       0       32
    B       500     540
    B       1250    1300
    B       1319    1340
    B       1920    2000

编辑(共6列):

又脏又快,请检查以下示例:

dirty and quick, pls check the below example:

文件:

kent$  cat file
A   0       100 1 2 3
A   101     160 4 5 6
A   200     300 7 8 9
A   500     1100 10 11 12
A   1200    1300 13 14 15
A   1301    1340 16 17 18
A   1810    2000 19 20 21
B   0       32  22 23 24
B   500     540 22 23 24
B   1250    1300 22 23 24
B   1319    1340 22 23 24
B   1920    2000 22 23 24

awk:

kent$  awk -v OFS="\t" '{s[NR]=$2;e[NR]=$3}
/^A/{l=NR}
/^B/{ 
        for(i=1;i<=l;i++){
                if(s[i]==$2){
                        s[i]=$3+1
                        break
                }else if(e[i]==$3){
                        e[i]=$2-1
                        break
                }
        }
}
{r[NR]=$4OFS$5OFS$6}
END{for(i=1;i<=NR;i++)print ((i<=l)?"A":"B"),s[i],e[i],r[i]} ' file
A       33      100     1       2       3
A       101     160     4       5       6
A       200     300     7       8       9
A       541     1100    10      11      12
A       1200    1249    13      14      15
A       1301    1318    16      17      18
A       1810    1919    19      20      21
B       0       32      22      23      24
B       500     540     22      23      24
B       1250    1300    22      23      24
B       1319    1340    22      23      24
B       1920    2000    22      23      24

这篇关于如何删除数字范围内的重叠(AWK)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆