带有data.table的组的变量的加权总和 [英] Weighted sum of variables by groups with data.table
问题描述
我正在寻找一个解决方案来计算组的一些变量的加权和与data.table。我希望例子很清楚。
I am looking for a solution to compute weighted sum of some variables by groups with data.table. I hope the example is clear enough.
require(data.table)
dt <- data.table(matrix(1:200, nrow = 10))
dt[, gr := c(rep(1,5), rep(2,5))]
dt[, w := 2]
# Error: object 'w' not found
dt[, lapply(.SD, function(x) sum(x * w)),
.SDcols = paste0("V", 1:4)]
# Error: object 'w' not found
dt[, lapply(.SD * w, sum),
.SDcols = paste0("V", 1:4)]
# This works with out groups
dt[, lapply(.SD, function(x) sum(x * dt$w)),
.SDcols = paste0("V", 1:4)]
# It does not work by groups
dt[, lapply(.SD, function(x) sum(x * dt$w)),
.SDcols = paste0("V", 1:4), keyby = gr]
# The result to be expected
dt[, list(V1 = sum(V1 * w),
V2 = sum(V2 * w),
V3 = sum(V3 * w),
V4 = sum(V4 * w)), keyby = gr]
### from Aruns answer
dt[, lapply(.SD[, paste0("V", 1:4), with = F],
function(x) sum(x*w)), by=gr]
推荐答案
最终尝试(复制Roland的回答:))
复制@ Roland的出色答案:
Final attempt (copying Roland's answer :))
Copying @Roland's excellent answer:
print(dt[, lapply(.SD, function(x, w) sum(x*w), w=w), by=gr][, w := NULL])
最有效的一个:(第二次尝试)
按照@ Roland的评论,对所有列执行操作确实更快,然后只删除不需要的操作本身并不耗时,这是这里的情况)。
still not the most efficient one: (second attempt)
Following @Roland's comment, it's indeed faster to do the operation on all columns and then just remove the unwanted ones (as long as the operation itself is not time consuming, which is the case here).
dt[, {lapply(.SD, function(x) sum(x*w))}, by=gr][, w := NULL][]
由于某些原因,当我不使用 {}
...时,似乎找不到 w
For some reason, w
seems to be not found when I don't use {}
.. No idea why though.
您可以在不使用 .SDcols
的情况下执行此操作然后删除它,同时将它提供给 lapply
,如下所示:
You can do this without using .SDcols
and then removing it while providing it to lapply
as follows:
dt[, lapply(.SD[, -1, with=FALSE], function(x) sum(x*w)), by=gr]
# gr V1 V2 V3 V4
# 1: 1 20 120 220 320
# 2: 2 70 170 270 370
.SDcols
使 .SD
不包含 w
因此,不可能乘以 w
,因为它不在.SD环境的范围内。
.SDcols
makes .SD
without the w
column. So, it's not possible to multiply with w
as it doesn't exist within the scope of .SD environment then.
这篇关于带有data.table的组的变量的加权总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!