data.table 1.8.x mean()函数自动删除NA吗? [英] data.table 1.8.x mean() function auto removing NA?

查看:72
本文介绍了data.table 1.8.x mean()函数自动删除NA吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

今天,由于data.table自动删除NA for mean

Today I found out a bug in my program due to data.table auto remove NA for mean

例如:

> a<-data.table(a=c(NA,NA,FALSE,FALSE), b=c(1,1,2,2))
> a

> a[,list(mean(a), sum(a)),by=b]
   b V1 V2
1: 1  0 NA // Why V1 = 0 here? I had expected NA
2: 2  0  0


> mean(c(NA,NA,FALSE,FALSE))
[1] NA
> mean(c(NA,NA))
[1] NA
> mean(c(FALSE,FALSE))
[1] 0

这是预期的行为吗?

推荐答案

这不是故意的.看起来像是优化问题...

This isn't intended. Looks like a problem with optimization ...

> a[,list(mean(a), sum(a)),by=b]
   b V1 V2
1: 1  0 NA
2: 2  0  0
> options(datatable.optimize=FALSE)
> a[,list(mean(a), sum(a)),by=b]
   b V1 V2
1: 1 NA NA
2: 2  0  0
> 

在v1.8.9中进行了调查和修复,即将在CRAN上发布.来自新闻:

Investigated and fixed in v1.8.9, soon to be on CRAN. From NEWS :

自v1.8.2起,j中的

mean()已进行了优化,但不遵循na.rm = TRUE(默认值).非常感谢Colin Fang的报告.已添加测试.

mean() in j has been optimized since v1.8.2 but wasn't respecting na.rm=TRUE (the default). Many thanks to Colin Fang for reporting. Test added.

v1.8.2中的新功能是:

The new feature in v1.8.2 was :

mean()现在已自动优化,#1231.当存在大量组时,这可以将分组速度提高20倍.请参见 Wiki点3 ,知道.通过设置选项(datatable.optimize = 0)关闭优化.

mean() is now automatically optimized, #1231. This can speed up grouping by 20 times when there are a large number of groups. See wiki point 3, which is no longer needed to know. Turn off optimization by setting options(datatable.optimize=0).

这篇关于data.table 1.8.x mean()函数自动删除NA吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆