R-data.table by group by-键的顺序和丢失的键 [英] R - data.table by group by - order of keys and missing keys

查看：141 发布时间：2020/10/15 21:04:14 r data.table

本文介绍了R-data.table by group by-键的顺序和丢失的键的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如果我有data.table

If I have a data.table

> DT1 <- data.table(A=rep(c('A', 'B'), 3),
                    B=rep(c(1,2,3), 2),
                    val=rnorm(6), key='A,B')
> DT1
   A B        val
1: A 1 -1.6283314
2: B 2  0.5337604
3: A 3  0.9991301
4: B 1  1.1421400
5: A 2  0.1230095
6: B 3  0.4988504

我想将一个以上的子集键，就像这样：

and I want to subset by more than one key, like so:

> DT1[J('A', 1)]                                                               
   A B          val
1: A 1 -0.004898047

但是，联接取决于键的顺序，因此键A的值必须始终排在第一位。即使您指定名称（作为 J（）或作为 list（）），这也不起作用：

However, the join is dependent on the order of the keys, so the value for key A must always come first. This will not work, even if you specify names (either as J() or as a list()):

> DT1[J(1, 'A')]
Error in `[.data.table`(DT1, J(1, "A")) : 
  x.'A' is a character column being joined to i.'V1' which is type 'double'. Character columns must join to factor or character columns.

> DT1[J(B=1, A='A')]
Error in `[.data.table`(DT1, J(B = 1, A = "A")) : 
  x.'A' is a character column being joined to i.'B' which is type 'double'. Character columns must join to factor or character columns.

是否存在一种语法，您可以按 i 不知道键的顺序？

Is there a syntax where you can do this kind of grouping by i without knowing the order of the keys?

已添加：另一个用例是，如果我想按B进行子集化仅使用A而不使用A-是否可以跳过子设置中的键？当前为J创建包装函数的答案似乎不允许这样做。

Added: Another use case would be if I wanted to subset by B only and not by A -- is there a way to skip keys in the subsetting? The current answers that create wrapper functions for J don't seem to allow this.

编辑：有些人提到这样做是为了data.frame办法。我知道您可以使用逻辑值向量作为子集，但这很慢，因为它会扫描整个表：

Some have mentioned doing it the data.frame way. I know that you can use a vector of logical values to subset, but this is slow as it does a scan of the entire table:

> DT1 <- data.table(A=rep(c(1,2,3), 100000), B=rep(c('A', 'B'), 150000), val=rnorm(300000), key='A,B')
> system.time(DT1[DT1$A==1, DT1$B=="A"])                                       
   user  system elapsed 
  0.080   0.000   0.054 
> system.time(DT1[J(1, 'A')])
   user  system elapsed 
  0.004   0.000   0.004

对相关讨论的一些引用：（1）

Some references to related discussions: (1)

R-data.table by group by-键的顺序和丢失的键 [英] R - data.table by group by - order of keys and missing keys

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R-data.table by group by-键的顺序和丢失的键 [英] R - data.table by group by - order of keys and missing keys

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭