data.table:应用具有多列输出的函数 [英] data.table: lapply a function with multicolumn output
问题描述
我正在使用Hmisc包中的函数smean.cl.normal
,该函数返回具有3个值的向量:均值以及上下CI.在具有2个组的data.table
上使用它时,获得2列和6行.有没有一种方法来获得结果,该结果具有对应于两组的两行,并且每个函数的输出分别为均值和CI,所以列为单独的列?
I'm using a function smean.cl.normal
from Hmisc package that returns a vector with 3 values: the mean and the lower and upper CI. When I use it on a data.table
with 2 groups, I obtain 2 columns and 6 rows. Is there a way to obtain the result with two rows corresponding to 2 groups and separate columns for each of function's outputs, i.e. the mean and CIs?
require(Hmisc)
require(data.table)
dt = data.table(x = rnorm(100),
gr = rep(c('A', 'B'), each = 50))
dt[, lapply(.SD, smean.cl.normal), by = gr, .SDcols = "x"]
输出:
gr x
1: A -0.07916335
2: A -0.33656667
3: A 0.17823998
4: B -0.02745333
5: B -0.32950607
6: B 0.27459941
所需的输出:
gr Mean Lower Upper
1: A -0.07916335 -0.33656667 0.17823998
2: B -0.02745333 -0.32950607 0.27459941
推荐答案
DT[i,j,by]
中的j
参数需要一个列表,因此请使用as.list
:
The j
argument in DT[i,j,by]
expects a list, so use as.list
:
dt[,
Reduce(c, lapply(.SD, function(x) as.list(smean.cl.normal(x))))
, by = gr, .SDcols = "x"]
# gr Mean Lower Upper
# 1: A 0.1032966 -0.1899466 0.3965398
# 2: B -0.1437617 -0.4261330 0.1386096
c(L1, L2, L3)
是列表的组合方式,因此,如果您的.SDcols
包含的内容不只是x
,那么Reduce(c, List_o_Lists)
可以解决问题.我想do.call(c, List_o_Lists)
也应该起作用.
c(L1, L2, L3)
is how lists are combined, so Reduce(c, List_o_Lists)
does the trick in case your .SDcols
contains more than just x
. I guess do.call(c, List_o_Lists)
should also work.
评论
这有很多原因,效率很低.打开verbose=TRUE
以查看data.table不喜欢在j
中获取命名列表:
This is quite inefficient for a couple of reasons. Turn on verbose=TRUE
to see that data.table doesn't like getting named lists in j
:
j的结果是一个命名列表.为每个组一遍又一遍地创建相同的名称是非常低效的.当j = list(...)时,为了提高效率,会在分组完成后检测,删除并放回任何名称.例如,使用j = transform()可以防止这种加速(考虑更改为:=).此消息将来可能会升级为警告.
The result of j is a named list. It's very inefficient to create the same names over and over again for each group. When j=list(...), any names are detected, removed and put back after grouping has completed, for efficiency. Using j=transform(), for example, prevents that speedup (consider changing to :=). This message may be upgraded to warning in future.
此外,您错过了mean
的组优化版本以及可能用于构建结果的其他功能.不过,对于您的用例来说,这可能没什么大不了的.
Also, you are missing out on group-optimized versions of mean
and other functions that can probably be used to build your result. This may not be a big deal for your use-case, though.
仅将其应用于单个值列时,只需:
When you're applying this to only a single value column, just:
dt[, as.list(smean.cl.normal(x)), by = gr]
足够了.
这篇关于data.table:应用具有多列输出的函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!