data.table不计算NA中的NA组 [英] data.table do not compute NA groups in by
问题描述
这个问题有部分答案这里,但问题太具体了我不能将它应用于我自己的问题。
在使用时,我想跳过NA组的潜在大量计算,通过
。
library(data.table)
DT = data .table(X =样本(10),
Y =样本(10),
g1 =样本(字母[1:2],10,TRUE),
g2 =样本(DT,1L,3L,NA)
set(DT,1L,4L,NA)
set(DT ,6L,3L,NA)
set(DT,6L,4L,NA)
DT [,mean(X * Y),by =。(g1,g2)]
这里我们可以看到最多有5组,包括(NA,NA)
组。考虑到(i)组是无用的(ii)组可能非常大,并且(iii)实际计算比平均(X * Y)
更复杂以有效的方式跳过该组?我的意思是,没有创建剩余表格的副本。确实,下面的工作。
DT2 = data.table ::: na.omit.data.table(DT,cols = c (g1,g2))
DT2 [,mean(X * Y),by =。(g1,g2)]
:
DT [,if(!anyNA(.BY))mean(X * Y),by =。(g1,g2)]
g1 g2 V1
1:ba 25.75000
2:ab 24.00000
3:bb 35.33333
从 ?. BY
help:
.BY
是一个 list
,它包含中每个项目的长度为1的矢量
。根据组变量的值,这可以用于根据 if()
进行分支
This question has a partial answer here but the question is too specific and I'm not able to apply it to my own problem.
I would like to skip a potentially heavy computation of the NA group when using by
.
library(data.table)
DT = data.table(X = sample(10),
Y = sample(10),
g1 = sample(letters[1:2], 10, TRUE),
g2 = sample(letters[1:2], 10, TRUE))
set(DT, 1L, 3L, NA)
set(DT, 1L, 4L, NA)
set(DT, 6L, 3L, NA)
set(DT, 6L, 4L, NA)
DT[, mean(X*Y), by = .(g1,g2)]
Here we can see there are up to 5 groups including the (NA, NA)
group. Considering that (i) the group is useless (ii) the groups can be very big and (iii) the actual computation is more complex than mean(X*Y)
can I skip the group in an efficient way? I mean, without creating a copy of the remaining table. Indeed the following works.
DT2 = data.table:::na.omit.data.table(DT, cols = c("g1", "g2"))
DT2[, mean(X*Y), by = .(g1,g2)]
解决方案 You can use an if
clause:
DT[, if (!anyNA(.BY)) mean(X*Y), by = .(g1,g2)]
g1 g2 V1
1: b a 25.75000
2: a b 24.00000
3: b b 35.33333
From the ?.BY
help:
.BY
is a list
containing a length 1 vector for each item in by
. This can be useful [...] to branch with if()
depending on the value of a group variable.
这篇关于data.table不计算NA中的NA组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!