data.table不计算NA中的NA组 [英] data.table do not compute NA groups in by

查看:163
本文介绍了data.table不计算NA中的NA组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题有部分答案这里,但问题太具体了我不能将它应用于我自己的问题。



在使用时,我想跳过NA组的潜在大量计算,通过

  library(data.table)

DT = data .table(X =样本(10),
Y =样本(10),
g1 =样本(字母[1:2],10,TRUE),
g2 =样本(DT,1L,3L,NA)
set(DT,1L,4L,NA)
set(DT ,6L,3L,NA)
set(DT,6L,4L,NA)

DT [,mean(X * Y),by =。(g1,g2)]

这里我们可以看到最多有5组,包括(NA,NA) 组。考虑到(i)组是无用的(ii)组可能非常大,并且(iii)实际计算比平均(X * Y)更复杂以有效的方式跳过该组?我的意思是,没有创建剩余表格的副本。确实,下面的工作。

  DT2 = data.table ::: na.omit.data.table(DT,cols = c (g1,g2))
DT2 [,mean(X * Y),by =。(g1,g2)]


解决方案

如果子句可以使用

  DT [,if(!anyNA(.BY))mean(X * Y),by =。(g1,g2)] 

g1 g2 V1
1:ba 25.75000
2:ab 24.00000
3:bb 35.33333

?. BY help:


.BY 是一个 list ,它包含中每个项目的长度为1的矢量。根据组变量的值,这可以用于根据 if()进行分支



This question has a partial answer here but the question is too specific and I'm not able to apply it to my own problem.

I would like to skip a potentially heavy computation of the NA group when using by.

library(data.table)

DT = data.table(X = sample(10), 
                Y = sample(10), 
                g1 = sample(letters[1:2], 10, TRUE),
                g2 = sample(letters[1:2], 10, TRUE))

set(DT, 1L, 3L, NA)
set(DT, 1L, 4L, NA)
set(DT, 6L, 3L, NA)
set(DT, 6L, 4L, NA)

DT[, mean(X*Y), by = .(g1,g2)]

Here we can see there are up to 5 groups including the (NA, NA) group. Considering that (i) the group is useless (ii) the groups can be very big and (iii) the actual computation is more complex than mean(X*Y) can I skip the group in an efficient way? I mean, without creating a copy of the remaining table. Indeed the following works.

DT2 = data.table:::na.omit.data.table(DT, cols = c("g1", "g2"))
DT2[, mean(X*Y), by = .(g1,g2)]

解决方案

You can use an if clause:

DT[, if (!anyNA(.BY)) mean(X*Y), by = .(g1,g2)]

   g1 g2       V1
1:  b  a 25.75000
2:  a  b 24.00000
3:  b  b 35.33333

From the ?.BY help:

.BY is a list containing a length 1 vector for each item in by. This can be useful [...] to branch with if() depending on the value of a group variable.

这篇关于data.table不计算NA中的NA组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆