数据表和分层手段 [英] data.table and stratified means

查看:117
本文介绍了数据表和分层手段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些代码生成分层加权平均值和
我确定这在几个月前工作。但是,但我不知道当前的问题是什么。
(我道歉 - 这必须是非常基本的东西):

I've got some code that generate stratified weighted means and I'm certain this worked a few months ago. But, but I'm not sure what the current problem is. (I apologize - this must be very basic stuff):

dp=
structure(list(seqn = c(1L, 2L, 3L, 4L, 6L, 7L, 8L, 9L, 10L, 
11L, 12L, 13L, 3L, 4L, 9L, 10L, 11L, 14L, 8L, 11L, 12L, 10L, 
5L, 13L, 2L, 14L, 3L, 9L, 6L, 7L), sex = c(2L, 1L, 2L, 2L, 1L, 
2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), bmi = c(22.8935608711259, 
27.0944623781918, 40.4637162938634, 23.7649712675423, 15.3193372705538, 
31.1280302540991, 21.4866354393239, 20.3200254374398, 32.331092513536, 
25.3679771839413, 33.9400508162971, 14.7048592172926, 25.5243757788688, 
23.4331882363495, 27.6428134168995, 29.3923629426172, 24.9547209666314, 
17.0522203606383, 15.51, 22, 30.62, 30.94, 29.1, 25.57, 24.9, 
27.33, 17.63, 18.48, 22.56, 29.39), tc = c(273L, 181L, 150L, 
201L, 142L, 165L, 235L, 219L, 298L, 222L, 143L, 134L, 268L, 160L, 
236L, 225L, 260L, 140L, 162L, 132L, 156L, 140L, 279L, 314L, 215L, 
174L, 129L, 148L, 153L, 245L), swt = c(1645, 3318, 2280, 1574, 
4062, 1627, 14604, 24675, 975, 975, 2697, 1559, 1737.58, 1730.23, 
19521.36, 28080.57, 1248.43, 13745.77, 5251.76464426326, 6497.194885522, 
15915.7023420765, 3740.96809540218, 16574.177622509, 307.32513798849, 
4720.89748295751, 3247.78896499604, 7698.70949077031, 1262.6450411464, 
6609.43340735515, 4254.23723479882)), .Names = c("seqn", "sex", 
"bmi", "tc", "swt"), row.names = c(20560L, 20561L, 20562L, 20563L, 
20565L, 20566L, 20567L, 20568L, 20569L, 20570L, 20571L, 20572L, 
61335L, 61336L, 61338L, 61339L, 61340L, 61341L, 95465L, 96890L, 
104613L, 105988L, 107581L, 112267L, 113403L, 114292L, 119979L, 
120271L, 125939L, 135699L), class = "data.frame")

dt=data.table(dp, key='sex')

sapply(df,function(x)weighted.mean(x,df$swt))  #this works to weighted mean
dt[,lapply(.SD, mean, na.rm=T), .SDcols=c('bmi','tc','swt')]  
     #this also works for overall unweighted mean

dt[,lapply(.SD, function(x)weighted.mean(x,swt, na.rm=TRUE)), by=key(dt), .SDcols=c('bmi','tc','swt')] 

但是会出现错误:
weighted.mean.default中的错误(x,swt,na.rm = TRUE):未找到对象swt

sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.8.6

loaded via a namespace (and not attached):
[1] tools_2.15.2


推荐答案

(来自Arun):现在已修正 v1.8.11 。从新闻



UPDATE (from Arun): This is now fixed in v1.8.11. From NEWS:


o DT [,lapply(.SD,function(),by =] 在优化处于开启状态时没有看到DT列。现在已修复,#2381 。测试成功添加并测试感谢David F报告SO:
data.table和分层意味着

o DT[, lapply(.SD, function(), by=] did not see columns of DT when optimisation is "on". This is now fixed, #2381. Tests added and tested successfully. Thanks to David F for reporting on SO: data.table and stratified means






这确实是介于1.8.2和1.8.6之间的一个错误。


This is indeed a bug introduced somewhere between 1.8.2 and 1.8.6.

dt[,lapply(.SD, function(x) weighted.mean(x,swt, na.rm=TRUE)), by=key(dt),
    .SDcols=c('bmi','tc','swt')] 
Error in weighted.mean.default(x, swt, na.rm = TRUE) : 
    object 'swt' not found

要解决此问题,请关闭优化:

To work around this in the meantime, either turn off optimization :

options(datatable.optimize=FALSE)
dt[,lapply(.SD, function(x)weighted.mean(x,swt, na.rm=TRUE)), by=key(dt),    
    .SDcols=c('bmi','tc','swt')]
   sex      bmi       tc      swt
1:   1 25.64376 206.0115 17171.20
2:   2 23.73566 193.8727 11467.47

code> function():

or, don't wrap with function() :

options(datatable.optimize=TRUE)
dt[,lapply(.SD, weighted.mean, swt, na.rm=TRUE), by=key(dt),    
    .SDcols=c('bmi','tc','swt')] 
   sex      bmi       tc      swt
1:   1 25.64376 206.0115 17171.20
2:   2 23.73566 193.8727 11467.47

我们现在更多地使用优化,但是这个例子滑过测试套件:tests 825.1,825.2和825.3没有覆盖一个函数的参数是另一个列,在一个匿名 function()。这将是一个问题,其中函数还没有给出;即不同于这种情况,其中 function()可以省略,因为已经给出了 weighted.mean

We are making more use of optimization now, but this case slipped through the test suite: tests 825.1, 825.2 and 825.3 didn't cover an argument to a function being another column, within an anonymous function(). It would be a problem where the function isn't already given; i.e., unlike this case, where the function() can just be omitted since weighted.mean is already given and can be applied as-is.

您可以通过设置 verbose = TRUE 来查看优化如何修改j或使用全局选项)。

You can see how optimization modifies j by setting verbose=TRUE (either per query or with the global option). In this case nothing would have been revealed as wrong by that verbose output, but just mentioning it as an aside.

现在以#2381:lapply的优化(.SD,function ()...)不再看到列里面...

谢谢!

这篇关于数据表和分层手段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆