R:根据另一个data.table有效地从data.table中选择指定的行? [英] R: efficiently select specified rows from a data.table according to another data.table?
本文介绍了R:根据另一个data.table有效地从data.table中选择指定的行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个名为dt的data.table,另一个名为sg的data.table,在这里,我想根据sg选择dt的子集。这意味着在 dt
中,所选行(colA和colB)都不应等于 sg
行(colA都和colB)。这是我的工作:
I have a data.table called dt, and another called sg, here, I want to select the subset of dt according to sg. it means in dt
, the selected rows (both colA and colB) should be not equal to sg
rows (both colA and colB). here is what I did:
dt <- data.table(colA = c(1, 1, 1, 2, 2, 3, 3), colB = c(10, 10, 10, 20, 20, 30, 30),
colC = c("A", "I", "A", "A", "A", "I", "A"))
dt
sg <- data.table(colA = c(1, 3), colB = c(10, 30))
sg
dt2 <- paste(dt[, colA], dt[, colB], sep = "-")
sg2 <- paste(sg[, colA], sg[, colB], sep = "-")
dt[!(dt2 %in% sg2)]
# OR the following one
# dt[!((dt[, colA] %in% sg[, colA]) & (dt[, colB] %in% sg[, colB]))]
> dt
colA colB colC
1: 1 10 A
2: 1 10 I
3: 1 10 A
4: 2 20 A
5: 2 20 A
6: 3 30 I
7: 3 30 A
> sg
colA colB
1: 1 10
2: 3 30
> dt[!(dt2 %in% sg2)]
colA colB colC
1: 2 20 A
2: 2 20 A
但是,当数据集很大时,粘贴操作很慢,您能帮我找出一种有效的方法来代替data.table吗?
However, when the data set is big, paste operation is slow, can you help me work out an efficient method to do such substituting of a data.table?
谢谢。
推荐答案
您可以使用主键来完成适当的左联接:
You can make use of primary keys to do the appropriate left join:
> setkey(dt, colA, colB)
> setkey(sg, colA, colB)
> dt[!sg]
colA colB colC
1: 2 20 A
2: 2 20 A
这应该效率更高。
这篇关于R:根据另一个data.table有效地从data.table中选择指定的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文