如何在data.table非等额联接中保持联接列不变? [英] How to keep join column unchanged in data.table non-equi join?

查看:86
本文介绍了如何在data.table非等额联接中保持联接列不变?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图删除 data.frame 中的行,其中 posn 列中的值不在范围内在另一个 data.frame 中给出,具有 data.table 的非平等联接功能。

I was trying to remove rows in a data.frame where the value in column posn was not in ranges given in another data.frame, with data.table's non-equi join feature.

这是我的数据的样子:

library(data.table)
df.cov <-
    structure(list(posn = c(1, 2, 3, 165, 1000), att = c("a", "b",
    "c", "d", "e")), .Names = c("posn", "att"), row.names = c(NA,
    -5L), class = "data.frame")
df.exons <-
    structure(list(start = c(2889, 2161, 277, 164, 1), end = c(3329,
    2826, 662, 662, 168)), .Names = c("start", "end"), row.names = c(NA,
    -5L), class = "data.frame")

setDT(df.cov)
setDT(df.exons)

df.cov
#    posn att
# 1:    1   a
# 2:    2   b
# 3:    3   c
# 4:  165   d
# 5: 1000   e
df.exons # ranges of `posn` to include
#    start  end
# 1:  2889 3329
# 2:  2161 2826
# 3:   277  662
# 4:   164  662
# 5:     1  168

这是我尝试的内容:

df.cov[df.exons, on = .(posn >= start, posn <= end), nomatch = 0]
#    posn att posn.1
# 1:  164   d    662
# 2:    1   a    168
# 3:    1   b    168
# 4:    1   c    168
# 5:    1   d    168

您可以看到 df.cov 中的 posn 列也已更改。预期结果如下所示:

You can see that the posn column in df.cov is also changed. The expected result looks like this:

#    posn att
# 1:  165   d
# 2:    1   a
# 3:    2   b
# 4:    3   c
# 5   165   d
# the row order doesn't matter. I'll sort by posn latter.
# It is also fine if the duplicated rows are removed, otherwise I'll do this in next step.

如何使用 data.table 非平等加入?

推荐答案

您还可以使用%inrange%

df.cov[posn %inrange% df.exons]

会导致:


   posn att
1:    1   a
2:    2   b
3:    3   c
4:  165   d


如您所见,这留下了 posn -的值

As you can see this leaves the values of the posn-column unchanged.

另一种(虽然更长)的可能性:

Another, though longer, possiblity:

df.exons[df.cov
         , on = .(start <= posn, end >= posn)
         , mult ='first'
         , nomatch = 0
         , .(posn = i.posn, att)][]

这篇关于如何在data.table非等额联接中保持联接列不变?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆