data.table不计算NA中的NA组 [英] data.table do not compute NA groups in by

查看：163 发布时间：2018/5/30 13:56:00 r group-by data.table grouping na

本文介绍了data.table不计算NA中的NA组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这个问题有部分答案这里，但问题太具体了我不能将它应用于我自己的问题。

在使用时，我想跳过NA组的潜在大量计算，通过。

  library（data.table）
 
 DT = data .table（X =样本（10），
 Y =样本（10），
 g1 =样本（字母[1：2]，10，TRUE），
 g2 =样本（DT，1L，3L，NA）
 set（DT，1L，4L，NA）
 set（DT ，6L，3L，NA）
 set（DT，6L，4L，NA）
 
 DT [，mean（X * Y），by =。（g1，g2）]

这里我们可以看到最多有5组，包括（NA，NA） 组。考虑到（i）组是无用的（ii）组可能非常大，并且（iii）实际计算比平均（X * Y）更复杂以有效的方式跳过该组？我的意思是，没有创建剩余表格的副本。确实，下面的工作。

  DT2 = data.table ::: na.omit.data.table（DT，cols = c （g1，g2））
 DT2 [，mean（X * Y），by =。（g1，g2）]

解决方案

如果子句可以使用

：
 
 
  DT [，if（！anyNA（.BY））mean（X * Y），by =。（g1，g2）] 
 
 g1 g2 V1 
 1：ba 25.75000 
 2：ab 24.00000 
 3：bb 35.33333 
  
从 ?. BY  help：
 
   .BY 是一个 list ，它包含中每个项目的长度为1的矢量。根据组变量的值，这可以用于根据 if（）进行分支
 
 
 
This question has a partial answer here but the question is too specific and I'm not able to apply it to my own problem.

I would like to skip a potentially heavy computation of the NA group when using by.
library(data.table)

DT = data.table(X = sample(10), 
                Y = sample(10), 
                g1 = sample(letters[1:2], 10, TRUE),
                g2 = sample(letters[1:2], 10, TRUE))

set(DT, 1L, 3L, NA)
set(DT, 1L, 4L, NA)
set(DT, 6L, 3L, NA)
set(DT, 6L, 4L, NA)

DT[, mean(X*Y), by = .(g1,g2)]
Here we can see there are up to 5 groups including the (NA, NA) group. Considering that (i) the group is useless (ii) the groups can be very big and  (iii) the actual computation is more complex than mean(X*Y) can I skip the group in an efficient way? I mean, without creating a copy of the remaining table. Indeed the following works.
DT2 = data.table:::na.omit.data.table(DT, cols = c("g1", "g2"))
DT2[, mean(X*Y), by = .(g1,g2)]

 解决方案 
You can use an if clause:
DT[, if (!anyNA(.BY)) mean(X*Y), by = .(g1,g2)]

   g1 g2       V1
1:  b  a 25.75000
2:  a  b 24.00000
3:  b  b 35.33333
From the ?.BY help:

  .BY is a list containing a length 1 vector for each item in by. This can be useful [...] to branch with if() depending on the value of a group variable.


                        
这篇关于data.table不计算NA中的NA组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

data.table不计算NA中的NA组 [英] data.table do not compute NA groups in by

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

data.table不计算NA中的NA组 [英] data.table do not compute NA groups in by

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭