如何在data.table非等额联接中保持联接列不变? [英] How to keep join column unchanged in data.table non-equi join?
本文介绍了如何在data.table非等额联接中保持联接列不变?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我试图删除 data.frame
中的行,其中 posn
列中的值不在范围内在另一个 data.frame
中给出,具有 data.table
的非平等联接功能。
I was trying to remove rows in a data.frame
where the value in column posn
was not in ranges given in another data.frame
, with data.table
's non-equi join feature.
这是我的数据的样子:
library(data.table)
df.cov <-
structure(list(posn = c(1, 2, 3, 165, 1000), att = c("a", "b",
"c", "d", "e")), .Names = c("posn", "att"), row.names = c(NA,
-5L), class = "data.frame")
df.exons <-
structure(list(start = c(2889, 2161, 277, 164, 1), end = c(3329,
2826, 662, 662, 168)), .Names = c("start", "end"), row.names = c(NA,
-5L), class = "data.frame")
setDT(df.cov)
setDT(df.exons)
df.cov
# posn att
# 1: 1 a
# 2: 2 b
# 3: 3 c
# 4: 165 d
# 5: 1000 e
df.exons # ranges of `posn` to include
# start end
# 1: 2889 3329
# 2: 2161 2826
# 3: 277 662
# 4: 164 662
# 5: 1 168
这是我尝试的内容:
df.cov[df.exons, on = .(posn >= start, posn <= end), nomatch = 0]
# posn att posn.1
# 1: 164 d 662
# 2: 1 a 168
# 3: 1 b 168
# 4: 1 c 168
# 5: 1 d 168
您可以看到 df.cov
中的 posn
列也已更改。预期结果如下所示:
You can see that the posn
column in df.cov
is also changed. The expected result looks like this:
# posn att
# 1: 165 d
# 2: 1 a
# 3: 2 b
# 4: 3 c
# 5 165 d
# the row order doesn't matter. I'll sort by posn latter.
# It is also fine if the duplicated rows are removed, otherwise I'll do this in next step.
如何使用 data.table $ c获得所需的输出$ c>非平等加入?
推荐答案
您还可以使用%inrange%
:
df.cov[posn %inrange% df.exons]
会导致:
posn att
1: 1 a
2: 2 b
3: 3 c
4: 165 d
如您所见,这留下了 posn
-的值
As you can see this leaves the values of the posn
-column unchanged.
另一种(虽然更长)的可能性:
Another, though longer, possiblity:
df.exons[df.cov
, on = .(start <= posn, end >= posn)
, mult ='first'
, nomatch = 0
, .(posn = i.posn, att)][]
这篇关于如何在data.table非等额联接中保持联接列不变?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文