子集只有那些间隔不落在另一个数据框架内的那些行 [英] Subset only those rows whose intervals does not fall within another data.frame
问题描述
如何比较不等长度的两个数据帧(测试和控制),并根据三个标准从测试中删除行,i)如果测试$ chr == control $ chr
ii)test $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ p> test =
R_level logp chr start end CNA基因
2 7.079 11 1159 1360收益Recl,Bcl
11 2.4 12 6335 6345 loss Pekg
3 19 13 7180 7229损失Sox1
控制=
R_level logp chr开始结束CNA基因
2 5.9 11 1100 1400收益Recl,Bcl
2 3.46 11 1002 1345收益Trp1
2 6.4 12 6705 6845收益Pekg
4 7 13 6480 8129损失Sox1
结果应该看起来像这样
result =
R_level logp chr start end CNA Gene
11 2.4 12 6335 6345 loss Pekg
使用 foverlaps()
从 data.table
。
require(data.table)#v1.9.4 +
dt1 < - as.data.table(test)
dt2 < - as.data.table(control)
setkey(dt2,chr,CNA,start,end)
olaps = foverlaps(dt1,dt2,nomatch = 0L,which = TRUE,type =within)
#xid yid
#1:1 2
#2:3 4
dt1 [!olaps $ xid]
#R_level logp chr start end CNA Gene
#1:11 2.4 12 6335 6345 loss Pekg
阅读?foverlaps
,有关详细信息,请参阅示例部分。
或者,您还可以使用 GenomicRanges
包。但是,您可能必须根据重叠区域(AFAICT)合并后根据 CNA
进行过滤。
How can i compare two data frames (test and control) of unequal length, and remove the row from test based on three criteria, i) if the test$chr == control$chr ii) test$start and test$end lies with in the range of control$start and control$end iii) test$CNA and control$CNA are same.
test =
R_level logp chr start end CNA Gene
2 7.079 11 1159 1360 gain Recl,Bcl
11 2.4 12 6335 6345 loss Pekg
3 19 13 7180 7229 loss Sox1
control =
R_level logp chr start end CNA Gene
2 5.9 11 1100 1400 gain Recl,Bcl
2 3.46 11 1002 1345 gain Trp1
2 6.4 12 6705 6845 gain Pekg
4 7 13 6480 8129 loss Sox1
The result should look something like this
result =
R_level logp chr start end CNA Gene
11 2.4 12 6335 6345 loss Pekg
Here's one way using foverlaps()
from data.table
.
require(data.table) # v1.9.4+
dt1 <- as.data.table(test)
dt2 <- as.data.table(control)
setkey(dt2, chr, CNA, start, end)
olaps = foverlaps(dt1, dt2, nomatch=0L, which=TRUE, type="within")
# xid yid
# 1: 1 2
# 2: 3 4
dt1[!olaps$xid]
# R_level logp chr start end CNA Gene
# 1: 11 2.4 12 6335 6345 loss Pekg
Read ?foverlaps
and see the examples section for more info.
Alternatively, you can also use GenomicRanges
package. However, you might have to filter based on CNA
after merging by overlapping regions (AFAICT).
这篇关于子集只有那些间隔不落在另一个数据框架内的那些行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!