带有data.table的组的变量的加权总和 [英] Weighted sum of variables by groups with data.table

查看:129
本文介绍了带有data.table的组的变量的加权总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一个解决方案来计算组的一些变量的加权和与data.table。我希望例子很清楚。

I am looking for a solution to compute weighted sum of some variables by groups with data.table. I hope the example is clear enough.

require(data.table)

dt <- data.table(matrix(1:200, nrow = 10))
dt[, gr := c(rep(1,5), rep(2,5))]
dt[, w := 2]

# Error: object 'w' not found
dt[, lapply(.SD, function(x) sum(x * w)),
   .SDcols = paste0("V", 1:4)]

# Error: object 'w' not found
dt[, lapply(.SD * w, sum),
   .SDcols = paste0("V", 1:4)]

# This works with out groups
dt[, lapply(.SD, function(x) sum(x * dt$w)),
   .SDcols = paste0("V", 1:4)]

# It does not work by groups
dt[, lapply(.SD, function(x) sum(x * dt$w)),
   .SDcols = paste0("V", 1:4), keyby = gr]

# The result to be expected
dt[, list(V1 = sum(V1 * w),
          V2 = sum(V2 * w),
          V3 = sum(V3 * w),
          V4 = sum(V4 * w)), keyby = gr]

### from Aruns answer
dt[, lapply(.SD[, paste0("V", 1:4), with = F],
            function(x) sum(x*w)), by=gr]


推荐答案

最终尝试(复制Roland的回答:))



复制@ Roland的出色答案:

Final attempt (copying Roland's answer :))

Copying @Roland's excellent answer:

print(dt[, lapply(.SD, function(x, w) sum(x*w), w=w), by=gr][, w := NULL])






最有效的一个:(第二次尝试)



按照@ Roland的评论,对所有列执行操作确实更快,然后只删除不需要的操作本身并不耗时,这是这里的情况)。


still not the most efficient one: (second attempt)

Following @Roland's comment, it's indeed faster to do the operation on all columns and then just remove the unwanted ones (as long as the operation itself is not time consuming, which is the case here).

dt[, {lapply(.SD, function(x) sum(x*w))}, by=gr][, w := NULL][]

由于某些原因,当我不使用 {} ...时,似乎找不到 w

For some reason, w seems to be not found when I don't use {}.. No idea why though.

您可以在不使用 .SDcols 的情况下执行此操作然后删除它,同时将它提供给 lapply ,如下所示:

You can do this without using .SDcols and then removing it while providing it to lapply as follows:

dt[, lapply(.SD[, -1, with=FALSE], function(x) sum(x*w)), by=gr]
#    gr V1  V2  V3  V4
# 1:  1 20 120 220 320
# 2:  2 70 170 270 370

.SDcols 使 .SD 不包含 w 因此,不可能乘以 w ,因为它不在.SD环境的范围内。

.SDcols makes .SD without the w column. So, it's not possible to multiply with w as it doesn't exist within the scope of .SD environment then.

这篇关于带有data.table的组的变量的加权总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆