R:根据另一个data.table有效地从data.table中选择指定的行? [英] R: efficiently select specified rows from a data.table according to another data.table?

查看:152
本文介绍了R:根据另一个data.table有效地从data.table中选择指定的行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个名为dt的data.table,另一个名为sg的data.table,在这里,我想根据sg选择dt的子集。这意味着在 dt 中,所选行(colA和colB)都不应等于 sg 行(colA都和colB)。这是我的工作:

I have a data.table called dt, and another called sg, here, I want to select the subset of dt according to sg. it means in dt, the selected rows (both colA and colB) should be not equal to sg rows (both colA and colB). here is what I did:

dt <- data.table(colA = c(1, 1, 1, 2, 2, 3, 3), colB = c(10, 10, 10, 20, 20, 30, 30), 
  colC = c("A", "I", "A", "A", "A", "I", "A"))
dt

sg <- data.table(colA = c(1, 3), colB = c(10, 30))
sg

dt2 <- paste(dt[, colA], dt[, colB], sep = "-")
sg2 <- paste(sg[, colA], sg[, colB], sep = "-")
dt[!(dt2 %in% sg2)]
# OR the following one
# dt[!((dt[, colA] %in% sg[, colA]) & (dt[, colB] %in% sg[, colB]))]
> dt
   colA colB colC
1:    1   10    A
2:    1   10    I
3:    1   10    A
4:    2   20    A
5:    2   20    A
6:    3   30    I
7:    3   30    A

> sg
    colA colB
1:    1   10
2:    3   30

> dt[!(dt2 %in% sg2)]
    colA colB colC
1:    2   20    A
2:    2   20    A

但是,当数据集很大时,粘贴操作很慢,您能帮我找出一种有效的方法来代替data.table吗?

However, when the data set is big, paste operation is slow, can you help me work out an efficient method to do such substituting of a data.table?

谢谢。

推荐答案

您可以使用主键来完成适当的左联接:

You can make use of primary keys to do the appropriate left join:

> setkey(dt, colA, colB)
> setkey(sg, colA, colB)
> dt[!sg]
   colA colB colC
1:    2   20    A
2:    2   20    A

这应该效率更高。

这篇关于R:根据另一个data.table有效地从data.table中选择指定的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆