数据表和分层手段 [英] data.table and stratified means
问题描述
我有一些代码生成分层加权平均值和
我确定这在几个月前工作。但是,但我不知道当前的问题是什么。
(我道歉 - 这必须是非常基本的东西):
I've got some code that generate stratified weighted means and I'm certain this worked a few months ago. But, but I'm not sure what the current problem is. (I apologize - this must be very basic stuff):
dp=
structure(list(seqn = c(1L, 2L, 3L, 4L, 6L, 7L, 8L, 9L, 10L,
11L, 12L, 13L, 3L, 4L, 9L, 10L, 11L, 14L, 8L, 11L, 12L, 10L,
5L, 13L, 2L, 14L, 3L, 9L, 6L, 7L), sex = c(2L, 1L, 2L, 2L, 1L,
2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), bmi = c(22.8935608711259,
27.0944623781918, 40.4637162938634, 23.7649712675423, 15.3193372705538,
31.1280302540991, 21.4866354393239, 20.3200254374398, 32.331092513536,
25.3679771839413, 33.9400508162971, 14.7048592172926, 25.5243757788688,
23.4331882363495, 27.6428134168995, 29.3923629426172, 24.9547209666314,
17.0522203606383, 15.51, 22, 30.62, 30.94, 29.1, 25.57, 24.9,
27.33, 17.63, 18.48, 22.56, 29.39), tc = c(273L, 181L, 150L,
201L, 142L, 165L, 235L, 219L, 298L, 222L, 143L, 134L, 268L, 160L,
236L, 225L, 260L, 140L, 162L, 132L, 156L, 140L, 279L, 314L, 215L,
174L, 129L, 148L, 153L, 245L), swt = c(1645, 3318, 2280, 1574,
4062, 1627, 14604, 24675, 975, 975, 2697, 1559, 1737.58, 1730.23,
19521.36, 28080.57, 1248.43, 13745.77, 5251.76464426326, 6497.194885522,
15915.7023420765, 3740.96809540218, 16574.177622509, 307.32513798849,
4720.89748295751, 3247.78896499604, 7698.70949077031, 1262.6450411464,
6609.43340735515, 4254.23723479882)), .Names = c("seqn", "sex",
"bmi", "tc", "swt"), row.names = c(20560L, 20561L, 20562L, 20563L,
20565L, 20566L, 20567L, 20568L, 20569L, 20570L, 20571L, 20572L,
61335L, 61336L, 61338L, 61339L, 61340L, 61341L, 95465L, 96890L,
104613L, 105988L, 107581L, 112267L, 113403L, 114292L, 119979L,
120271L, 125939L, 135699L), class = "data.frame")
dt=data.table(dp, key='sex')
sapply(df,function(x)weighted.mean(x,df$swt)) #this works to weighted mean
dt[,lapply(.SD, mean, na.rm=T), .SDcols=c('bmi','tc','swt')]
#this also works for overall unweighted mean
dt[,lapply(.SD, function(x)weighted.mean(x,swt, na.rm=TRUE)), by=key(dt), .SDcols=c('bmi','tc','swt')]
但是会出现错误:
weighted.mean.default中的错误(x,swt,na.rm = TRUE):未找到对象swt
sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: i386-w64-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.8.6
loaded via a namespace (and not attached):
[1] tools_2.15.2
推荐答案
(来自Arun):现在已修正 v1.8.11 。从新闻:
UPDATE (from Arun): This is now fixed in v1.8.11. From NEWS:
o
DT [,lapply(.SD,function(),by =]
在优化处于开启状态时没有看到DT列。现在已修复,#2381 。测试成功添加并测试感谢David F报告SO:
data.table和分层意味着
o
DT[, lapply(.SD, function(), by=]
did not see columns of DT when optimisation is "on". This is now fixed, #2381. Tests added and tested successfully. Thanks to David F for reporting on SO: data.table and stratified means
这确实是介于1.8.2和1.8.6之间的一个错误。
This is indeed a bug introduced somewhere between 1.8.2 and 1.8.6.
dt[,lapply(.SD, function(x) weighted.mean(x,swt, na.rm=TRUE)), by=key(dt),
.SDcols=c('bmi','tc','swt')]
Error in weighted.mean.default(x, swt, na.rm = TRUE) :
object 'swt' not found
要解决此问题,请关闭优化:
To work around this in the meantime, either turn off optimization :
options(datatable.optimize=FALSE)
dt[,lapply(.SD, function(x)weighted.mean(x,swt, na.rm=TRUE)), by=key(dt),
.SDcols=c('bmi','tc','swt')]
sex bmi tc swt
1: 1 25.64376 206.0115 17171.20
2: 2 23.73566 193.8727 11467.47
code> function():
or, don't wrap with function()
:
options(datatable.optimize=TRUE)
dt[,lapply(.SD, weighted.mean, swt, na.rm=TRUE), by=key(dt),
.SDcols=c('bmi','tc','swt')]
sex bmi tc swt
1: 1 25.64376 206.0115 17171.20
2: 2 23.73566 193.8727 11467.47
我们现在更多地使用优化,但是这个例子滑过测试套件:tests 825.1,825.2和825.3没有覆盖一个函数的参数是另一个列,在一个匿名 function()
。这将是一个问题,其中函数还没有给出;即不同于这种情况,其中 function()
可以省略,因为已经给出了 weighted.mean
We are making more use of optimization now, but this case slipped through the test suite: tests 825.1, 825.2 and 825.3 didn't cover an argument to a function being another column, within an anonymous function()
. It would be a problem where the function isn't already given; i.e., unlike this case, where the function()
can just be omitted since weighted.mean
is already given and can be applied as-is.
您可以通过设置 verbose = TRUE
来查看优化如何修改j或使用全局选项)。
You can see how optimization modifies j by setting verbose=TRUE
(either per query or with the global option). In this case nothing would have been revealed as wrong by that verbose output, but just mentioning it as an aside.
现在以#2381:lapply的优化(.SD,function ()...)不再看到列里面... 。
谢谢!
这篇关于数据表和分层手段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!