用R查找范围内的重叠 [英] Finding overlap in ranges with R
问题描述
我有两个data.frames,每个都有三列:chrom,start&停下来,我们称它们为rangeA和rangeB.对于rangeA的每一行,我正在寻找rangeB中的哪一行(如果有)完全包含rangeA的行-我的意思是rangesAChrom == rangesBChrom, rangesAStart >= rangesBStart and rangesAStop <= rangesBStop
.
I have two data.frames each with three columns: chrom, start & stop, let's call them rangesA and rangesB. For each row of rangesA, I'm looking to find which (if any) row in rangesB fully contains the rangesA row - by which I mean rangesAChrom == rangesBChrom, rangesAStart >= rangesBStart and rangesAStop <= rangesBStop
.
现在我正在做以下事情,我只是不太喜欢.请注意,由于其他原因,我正在遍历rangeA的行,但是这些原因都不是什么大不了的,给定这个特定的解决方案,它最终只会使事情变得更具可读性.
Right now I'm doing the following, which I just don't like very much. Note that I'm looping over the rows of rangesA for other reasons, but none of those reasons are likely to be a big deal, it just ends up making things more readable given this particular solution.
范围A:
chrom start stop
5 100 105
1 200 250
9 275 300
范围B:
chrom start stop
1 200 265
5 99 106
9 275 290
对于范围A中的每一行:
for each row in rangesA:
matches <- which((rangesB[,'chrom'] == rangesA[row,'chrom']) &&
(rangesB[,'start'] <= rangesA[row, 'start']) &&
(rangesB[,'stop'] >= rangesA[row, 'stop']))
我认为,有一种比循环遍历此构造更好的方法(更好的是,在rangeA和rangeB的大型实例上,它的执行速度更快).有什么想法吗?
I figure there's got to be a better (and by better, I mean faster over large instances of rangesA and rangesB) way to do this than looping over this construct. Any ideas?
推荐答案
如果您可以先合并两个对象,这会容易/快捷得多.
This would be a lot easier / faster if you can merge the two objects first.
ranges <- merge(rangesA,rangesB,by="chrom",suffixes=c("A","B"))
ranges[with(ranges, startB <= startA & stopB >= stopA),]
# chrom startA stopA startB stopB
#1 1 200 250 200 265
#2 5 100 105 99 106
这篇关于用R查找范围内的重叠的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!