R-data.table by group by-键的顺序和丢失的键 [英] R - data.table by group by - order of keys and missing keys
问题描述
如果我有data.table
If I have a data.table
> DT1 <- data.table(A=rep(c('A', 'B'), 3),
B=rep(c(1,2,3), 2),
val=rnorm(6), key='A,B')
> DT1
A B val
1: A 1 -1.6283314
2: B 2 0.5337604
3: A 3 0.9991301
4: B 1 1.1421400
5: A 2 0.1230095
6: B 3 0.4988504
我想将一个以上的子集键,就像这样:
and I want to subset by more than one key, like so:
> DT1[J('A', 1)]
A B val
1: A 1 -0.004898047
但是,联接取决于键的顺序,因此键A的值必须始终排在第一位。即使您指定名称(作为 J()
或作为 list()
),这也不起作用:
However, the join is dependent on the order of the keys, so the value for key A must always come first. This will not work, even if you specify names (either as J()
or as a list()
):
> DT1[J(1, 'A')]
Error in `[.data.table`(DT1, J(1, "A")) :
x.'A' is a character column being joined to i.'V1' which is type 'double'. Character columns must join to factor or character columns.
> DT1[J(B=1, A='A')]
Error in `[.data.table`(DT1, J(B = 1, A = "A")) :
x.'A' is a character column being joined to i.'B' which is type 'double'. Character columns must join to factor or character columns.
是否存在一种语法,您可以按 i
不知道键的顺序?
Is there a syntax where you can do this kind of grouping by i
without knowing the order of the keys?
已添加:另一个用例是,如果我想按B进行子集化仅使用A而不使用A-是否可以跳过子设置中的键?当前为J创建包装函数的答案似乎不允许这样做。
Added: Another use case would be if I wanted to subset by B only and not by A -- is there a way to skip keys in the subsetting? The current answers that create wrapper functions for J don't seem to allow this.
编辑:有些人提到这样做是为了data.frame办法。我知道您可以使用逻辑值向量作为子集,但这很慢,因为它会扫描整个表:
Some have mentioned doing it the data.frame way. I know that you can use a vector of logical values to subset, but this is slow as it does a scan of the entire table:
> DT1 <- data.table(A=rep(c(1,2,3), 100000), B=rep(c('A', 'B'), 150000), val=rnorm(300000), key='A,B')
> system.time(DT1[DT1$A==1, DT1$B=="A"])
user system elapsed
0.080 0.000 0.054
> system.time(DT1[J(1, 'A')])
user system elapsed
0.004 0.000 0.004
对相关讨论的一些引用:(1)
Some references to related discussions: (1)
推荐答案
本着@Frank的回答,但尝试自动获取密钥:
In the spirit of @Frank's answer, but trying to get the key automagically:
myJ2 = function(...) {
# 'x' a couple of frames above is where the original data.table sits
data.table(..., key = key(get('x', parent.frame(n = 3))))
}
DT1[myJ2(B=1, A='A')]
# A B val
#1: A 1 0.4328698
这篇关于R-data.table by group by-键的顺序和丢失的键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!