如何删除数字范围内的重叠(AWK) [英] How to remove overlap in numeric ranges (AWK)
问题描述
我正在尝试删除文件中的重叠部分.
I'm trying to remove the overlap within a file.
- 有一堆记录以"A"开头,并且具有起始值"和结束值".
- 还有许多以"B"开头的记录,也有范围,并且显示与以"A"开头的记录可能重叠的记录.想法是从A删除重叠范围,以便仅存在非重叠范围.
B中的某些记录具有与A相同的起始值",而其他记录具有与A相同的结束值".因此,如果A的范围为0-100,而B的范围为0-32然后,我的预期输出是: A 33-100和B 0-32.
Some of the records in B have an identical 'start-value' while others have an identical 'end-value' with A. So, if A has a range of 0 - 100 and B has a range of 0 - 32. Then my expected output is: A 33 - 100 and B 0 - 32.
尽管我有很多文件需要进行此操作,但是单个文件很小.
Although I have a lot of files that needs to undergo this operation, the individual files are very small.
这是一个示例文件:
A 0 100
A 101 160
A 200 300
A 500 1100
A 1200 1300
A 1301 1340
A 1810 2000
B 0 32
B 500 540
B 1250 1300
B 1319 1340
B 1920 2000
预期的样品产量
A 33 100
A 101 160
A 200 300
A 541 1100
A 1200 1249
A 1301 1318
A 1810 1919
B 0 32
B 500 540
B 1250 1300
B 1319 1340
B 1920 2000
感谢您的所有帮助!
推荐答案
好吧,因为OP确认B 501 540
是拼写错误,所以我发表了我的回答:)
Ok, since OP confirmed that the B 501 540
is typo, I post my answer :)
awk -v OFS="\t" '/^A/{s[NR]=$2;e[NR]=$3;l=NR}
/^B/{
for(i=1;i<=l;i++){
if(s[i]==$2){
s[i]=$3+1
break
}else if(e[i]==$3){
e[i]=$2-1
break
}
}
s[NR] = $2; e[NR]=$3
}
END{for(i=1;i<=NR;i++)print ((i<=l)?"A":"B"),s[i],e[i]}
' file
测试您的文件(拼写错误是固定的):
test with your file (the typo was fixed):
kent$ awk -v OFS="\t" '/^A/{s[NR]=$2;e[NR]=$3;l=NR}
/^B/{
for(i=1;i<=l;i++){
if(s[i]==$2){
s[i]=$3+1
break
}else if(e[i]==$3){
e[i]=$2-1
break
}
}
s[NR] = $2; e[NR]=$3
}
END{for(i=1;i<=NR;i++)print ((i<=l)?"A":"B"),s[i],e[i]}
' file
A 33 100
A 101 160
A 200 300
A 541 1100
A 1200 1249
A 1301 1318
A 1810 1919
B 0 32
B 500 540
B 1250 1300
B 1319 1340
B 1920 2000
编辑(共6列):
又脏又快,请检查以下示例:
dirty and quick, pls check the below example:
文件:
kent$ cat file
A 0 100 1 2 3
A 101 160 4 5 6
A 200 300 7 8 9
A 500 1100 10 11 12
A 1200 1300 13 14 15
A 1301 1340 16 17 18
A 1810 2000 19 20 21
B 0 32 22 23 24
B 500 540 22 23 24
B 1250 1300 22 23 24
B 1319 1340 22 23 24
B 1920 2000 22 23 24
awk:
kent$ awk -v OFS="\t" '{s[NR]=$2;e[NR]=$3}
/^A/{l=NR}
/^B/{
for(i=1;i<=l;i++){
if(s[i]==$2){
s[i]=$3+1
break
}else if(e[i]==$3){
e[i]=$2-1
break
}
}
}
{r[NR]=$4OFS$5OFS$6}
END{for(i=1;i<=NR;i++)print ((i<=l)?"A":"B"),s[i],e[i],r[i]} ' file
A 33 100 1 2 3
A 101 160 4 5 6
A 200 300 7 8 9
A 541 1100 10 11 12
A 1200 1249 13 14 15
A 1301 1318 16 17 18
A 1810 1919 19 20 21
B 0 32 22 23 24
B 500 540 22 23 24
B 1250 1300 22 23 24
B 1319 1340 22 23 24
B 1920 2000 22 23 24
这篇关于如何删除数字范围内的重叠(AWK)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!