按组和列加权平均 [英] weighted means by group and column

查看：177 发布时间：2018/1/27 23:00:53 r for-loop apply sapply

本文介绍了按组和列加权平均的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我希望得到几个（实际上是大约60个）列中的每一个的加权平均值。这个问题非常类似于：反复申请对于在数据框中计算组意味着什么，刚刚被问到。

我已经想出了两种获得加权的方法：

为每列使用单独的 sapply 语句
for循环内 sapply 语句

然而，我觉得必须有一种方法来在 sapply 中插入 apply 语句，反之亦然，从而消除 for-loop 。我尝试了许多排列而没有成功。我还看了 sweep 函数。

这是我目前使用的代码。 b
$ b
df < - read.table（text = 地区州县权重y1980 y1990 y2000 1 1 1 10 100 200 50 1 1 2 5 50 100 200 1 1 3 120 1000 500 250 1 1 4 2 25 100 400 1 1 4 15 125 150 200 2 2 1 1 10 50 150 2 2 2 10 10 10 200 2 2 2 40 40 100 30 2 2 3 20 100 100 10 header = TRUE，na.strings = NA）＃向数据集添加一个组变量组< - paste（df $ region，'_'，df $ state，'_'，df $ co unty，sep =） df< - data.frame（group，df）＃获得y1980，y1990和y2000的加权平均值＃（x，y，x，y）， $ b sapply（split（df，df $ group），function（x）weighted.mean（x $ y1980，w = x $ weights）） sapply split（df，df $ group），function（x）weighted.mean（x $ y1990，w = x $ weights）） sapply（split（df，df $ group），function（x）weighted.mean （x $ y2000，w = x $权重））＃使用for循环获得y1980，y1990和y2000 ＃的一列的加权平均值 y < - matrix（NA，nrow = 7，ncol = 3） group.b <-df [！duplicated（df $ group），1] for （分割（df [，c（1：5，i）]，df $组），函数（ x）weighted.mean（x [，6]，w = x $ weights）） } ＃将加权平均值加到原始数据集 y2 < - data.frame（group.b，y） colnames（y2）<-c（'group'，'ave1980'，'ave1990'，'ave2000'） y2 y3 < - merge（df，y2，by = c（'group'），all = TRUE） y3
对不起，我最近的问题，并感谢您的任何意见。

编辑显示 y3
$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ 1 $ 200 50 100.0000 200.0000 50.0000
2 1_1_2 1 1 2 5 50 100 200 50.0000 100.0000 200.0000
3 1_1_3 1 1 3 120 1000 500 250 1000.0000 500.0000 250.0000
4 1_1_4 1 1 4 2 25 100 400 113.2353 144.1176 223.5294
5 1_1_4 1 1 4 15 125 150 200 113.2353 144.1176 223.5294
6 2_2_1 2 2 1 1 10 50 150 10.0000 50.0000 150.0000
7 2_2_2 2 2 10 10 10 200 34.0000 82.0000 64.0000
8 2_2_2 2 2 2 40 40 100 30 34.0000 82.0000 64.0000
9 2_2_3 2 2 3 20 100 100 10 100.0000 100.0000 10.0000

解决方案我建议使用package data.table：

library（data.table） dt < - as.data.table（df） dt2 <-dt [，lapply（.SD，weighted.mean，w = weights），by = list（region，state，county）] print（dt2）地区州郡权重y1980 y1990 y2000 1：1 1 1 10.00000 100.0000 200.0000 50.0000 2：1 1 2 5.00000 50.0000 100.0000 200.0000 3：1 1 3 120.00000 1000.0000 500.0000 250.0000 4：1 1 4 13.47059 113.2353 144.1176 223.5294 5：2 2 1 1.00000 10.0000 50.0000 150.0000 6：2 2 2 34.00000 34.0000 82.0000 64.0000 7：2 2 3 20.00000 100.0000 100.0000 10.0000
如果您希望 merge wi之后的原始data.table：

merge（dt，dt2，by = c（region，state，县））地区州权重x y1980.x y1990.x y2000.x权重yy1980.y y1990.y y2000.y 1：1 1 1 10 100 200 50 10.00000 100.0000 200.0000 50.0000 2：1 1 2 5 50 100 200 5.00000 50.0000 100.0000 200.0000 3：1 1 3 120 1000 500 250 120.00000 1000.0000 500.0000 250.0000 4：1 1 4 2 25 100 400 13.47059 113.2353 144.1176 223.5294 5：1 1 4 15 125 150 200 13.47059 113.2353 144.1176 223.5294 6：2 2 1 1 10 50 150 1.00000 10.0000 50.0000 150.0000 7：2 2 2 10 10 10 200 34.00000 34.0000 82.0000 64.0000 8：2 2 2 40 40 100 30 34.00000 34.0000 82.0000 64.0000 9：2 2 3 20 100 100 10 20.00000 100.0000 100.0000 10.0000

I wish to obtain weighted means by group for each of several (actually about 60) columns. This question is very similar to: repeatedly applying ave for computing group means in a data frame just asked.

I have come up with two ways to obtain the weighted means so far:

use a separate sapply statement for each column

place an sapply statement inside a for-loop

However, I feel there must be a way to insert an apply statement inside the sapply statement or vice versa, thereby eliminating the for-loop. I have tried numerous permutations without success. I also looked at the sweep function.

Here is the code I have so far.
df <- read.table(text= " region state county weights y1980 y1990 y2000 1 1 1 10 100 200 50 1 1 2 5 50 100 200 1 1 3 120 1000 500 250 1 1 4 2 25 100 400 1 1 4 15 125 150 200 2 2 1 1 10 50 150 2 2 2 10 10 10 200 2 2 2 40 40 100 30 2 2 3 20 100 100 10 ", header=TRUE, na.strings=NA) # add a group variable to the data set group <- paste(df$region, '_', df$state, '_', df$county, sep = "") df <- data.frame(group, df) # obtain weighted averages for y1980, y1990 and y2000 # one column at a time using one sapply per column sapply(split(df, df$group), function(x) weighted.mean(x$y1980, w = x$weights)) sapply(split(df, df$group), function(x) weighted.mean(x$y1990, w = x$weights)) sapply(split(df, df$group), function(x) weighted.mean(x$y2000, w = x$weights)) # obtain weighted average for y1980, y1990 and y2000 # one column at a time using a for-loop y <- matrix(NA, nrow=7, ncol=3) group.b <- df[!duplicated(df$group), 1] for(i in 6:8) { y[,(i-5)] <- sapply(split(df[,c(1:5,i)], df$group), function(x) weighted.mean(x[,6], w = x$weights)) } # add weighted averages to the original data set y2 <- data.frame(group.b, y) colnames(y2) <- c('group','ave1980','ave1990','ave2000') y2 y3 <- merge(df, y2, by=c('group'), all = TRUE) y3
Sorry for all of my questions lately, and thank you for any advice.

EDITED to show y3
group region state county weights y1980 y1990 y2000 ave1980 ave1990 ave2000 1 1_1_1 1 1 1 10 100 200 50 100.0000 200.0000 50.0000 2 1_1_2 1 1 2 5 50 100 200 50.0000 100.0000 200.0000 3 1_1_3 1 1 3 120 1000 500 250 1000.0000 500.0000 250.0000 4 1_1_4 1 1 4 2 25 100 400 113.2353 144.1176 223.5294 5 1_1_4 1 1 4 15 125 150 200 113.2353 144.1176 223.5294 6 2_2_1 2 2 1 1 10 50 150 10.0000 50.0000 150.0000 7 2_2_2 2 2 2 10 10 10 200 34.0000 82.0000 64.0000 8 2_2_2 2 2 2 40 40 100 30 34.0000 82.0000 64.0000 9 2_2_3 2 2 3 20 100 100 10 100.0000 100.0000 10.0000

解决方案
I suggest to use package data.table:
library(data.table) dt <- as.data.table(df) dt2 <- dt[,lapply(.SD,weighted.mean,w=weights),by=list(region,state,county)] print(dt2) region state county weights y1980 y1990 y2000 1: 1 1 1 10.00000 100.0000 200.0000 50.0000 2: 1 1 2 5.00000 50.0000 100.0000 200.0000 3: 1 1 3 120.00000 1000.0000 500.0000 250.0000 4: 1 1 4 13.47059 113.2353 144.1176 223.5294 5: 2 2 1 1.00000 10.0000 50.0000 150.0000 6: 2 2 2 34.00000 34.0000 82.0000 64.0000 7: 2 2 3 20.00000 100.0000 100.0000 10.0000
If you want you can merge with the original data.table afterwards:
merge(dt,dt2,by=c("region","state","county")) region state county weights.x y1980.x y1990.x y2000.x weights.y y1980.y y1990.y y2000.y 1: 1 1 1 10 100 200 50 10.00000 100.0000 200.0000 50.0000 2: 1 1 2 5 50 100 200 5.00000 50.0000 100.0000 200.0000 3: 1 1 3 120 1000 500 250 120.00000 1000.0000 500.0000 250.0000 4: 1 1 4 2 25 100 400 13.47059 113.2353 144.1176 223.5294 5: 1 1 4 15 125 150 200 13.47059 113.2353 144.1176 223.5294 6: 2 2 1 1 10 50 150 1.00000 10.0000 50.0000 150.0000 7: 2 2 2 10 10 10 200 34.00000 34.0000 82.0000 64.0000 8: 2 2 2 40 40 100 30 34.00000 34.0000 82.0000 64.0000 9: 2 2 3 20 100 100 10 20.00000 100.0000 100.0000 10.0000

这篇关于按组和列加权平均的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

按组和列加权平均 [英] weighted means by group and column

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

按组和列加权平均 [英] weighted means by group and column

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭