data.table:应用具有多列输出的函数 [英] data.table: lapply a function with multicolumn output

查看:87
本文介绍了data.table:应用具有多列输出的函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Hmisc包中的函数smean.cl.normal,该函数返回具有3个值的向量:均值以及上下CI.在具有2个组的data.table上使用它时,获得2列和6行.有没有一种方法来获得结果,该结果具有对应于两组的两行,并且每个函数的输出分别为均值和CI,所以列为单独的列?

I'm using a function smean.cl.normal from Hmisc package that returns a vector with 3 values: the mean and the lower and upper CI. When I use it on a data.table with 2 groups, I obtain 2 columns and 6 rows. Is there a way to obtain the result with two rows corresponding to 2 groups and separate columns for each of function's outputs, i.e. the mean and CIs?

require(Hmisc)
require(data.table)

dt = data.table(x = rnorm(100),
                gr = rep(c('A', 'B'), each = 50))

dt[, lapply(.SD, smean.cl.normal), by = gr, .SDcols = "x"]

输出:

   gr           x
1:  A -0.07916335
2:  A -0.33656667
3:  A  0.17823998
4:  B -0.02745333
5:  B -0.32950607
6:  B  0.27459941

所需的输出:

   gr        Mean         Lower         Upper
1:  A -0.07916335   -0.33656667    0.17823998
2:  B -0.02745333   -0.32950607    0.27459941

推荐答案

DT[i,j,by]中的j参数需要一个列表,因此请使用as.list:

The j argument in DT[i,j,by] expects a list, so use as.list:

dt[, 
  Reduce(c, lapply(.SD, function(x) as.list(smean.cl.normal(x))))
, by = gr, .SDcols = "x"]

#    gr       Mean      Lower     Upper
# 1:  A  0.1032966 -0.1899466 0.3965398
# 2:  B -0.1437617 -0.4261330 0.1386096

c(L1, L2, L3)是列表的组合方式,因此,如果您的.SDcols包含的内容不只是x,那么Reduce(c, List_o_Lists)可以解决问题.我想do.call(c, List_o_Lists)也应该起作用.

c(L1, L2, L3) is how lists are combined, so Reduce(c, List_o_Lists) does the trick in case your .SDcols contains more than just x. I guess do.call(c, List_o_Lists) should also work.

评论

这有很多原因,效率很低.打开verbose=TRUE以查看data.table不喜欢在j中获取命名列表:

This is quite inefficient for a couple of reasons. Turn on verbose=TRUE to see that data.table doesn't like getting named lists in j:

j的结果是一个命名列表.为每个组一遍又一遍地创建相同的名称是非常低效的.当j = list(...)时,为了提高效率,会在分组完成后检测,删除并放回任何名称.例如,使用j = transform()可以防止这种加速(考虑更改为:=).此消息将来可能会升级为警告.

The result of j is a named list. It's very inefficient to create the same names over and over again for each group. When j=list(...), any names are detected, removed and put back after grouping has completed, for efficiency. Using j=transform(), for example, prevents that speedup (consider changing to :=). This message may be upgraded to warning in future.

此外,您错过了mean的组优化版本以及可能用于构建结果的其他功能.不过,对于您的用例来说,这可能没什么大不了的.

Also, you are missing out on group-optimized versions of mean and other functions that can probably be used to build your result. This may not be a big deal for your use-case, though.

仅将其应用于单个值列时,只需:

When you're applying this to only a single value column, just:

dt[, as.list(smean.cl.normal(x)), by = gr]

足够了.

这篇关于data.table:应用具有多列输出的函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆