如何使用summarise_each计算加权平均数? [英] How do I compute weighted average using summarise_each?

查看:182
本文介绍了如何使用summarise_each计算加权平均数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何使用dplyr中的summarise_each计算数据集中所有字段的加权平均值?例如,假设我们要通过 cyl mtcars 数据集分组,并计算将重量作为齿轮栏。我尝试过以下操作,但无法使其正常工作。

How can I compute the weighted average of all the fields in a dataset using summarise_each in dplyr? For example, let's say we want to group the mtcars dataset by cyl and compute the weighted average of all columns where the weights are taken as the gear column. I've tried the following but could not get it to work.

mtcars %>% group_by(cyl) %>% summarise_each(funs(weighted.mean(., gear)))

# The line above gives the following output
# Error in weighted.mean.default(c(1, 2, 2, 1, 2, 1, 1, 1, 2, 2, 2), 4.15555555555556) : 
# 'x' and 'w' must have the same length

提前感谢您的帮助!

推荐答案

帮助看看这里发生了什么。让一个函数
返回参数的长度

To help see what's going on here. lets make a little function that returns the lengths of it's arguments

lenxy <- function(x,y)
    paste0(length(x),'-',length(y))

然后将其应用于 summarise_each ,如下所示:

and then apply it in summarise_each, as in:

mtcars %>% group_by(cyl) %>% summarise_each(funs(lenxy(., qsec)))

#>   cyl   mpg  disp    hp  drat    wt  qsec   vs   am gear carb
#> 1   4 11-11 11-11 11-11 11-11 11-11 11-11 11-1 11-1 11-1 11-1
#> 2   6   7-7   7-7   7-7   7-7   7-7   7-7  7-1  7-1  7-1  7-1
#> 3   8 14-14 14-14 14-14 14-14 14-14 14-14 14-1 14-1 14-1 14-1

看这张表,你可以看到,
的第一个和第二个参数的长度是相同的,直到 qseq 然后
后面的第二个参数为 lenxy 的长度为1,这是dplyr对数据进行操作的结果
,而不是创建一个新的data.fame来替换每个
字段,而不是创建一个新的data.fame。

Looking at this table, you can see that the lengths of the first and second arguments are the same up until qseq and then afterword the second argument to lenxy has length 1, which is the result of the fact that dplyr does operates on the data in place, replacing each field with it's summary, rather than creating a new data.fame.

解决方案很简单:从摘要中排除加权变量: / p>

The solution is easy: exclude the weighting variable from the summary:

mtcars %>% 
    group_by(cyl) %>% 
    summarise_each(funs(weighted.mean(., gear)),
                   -gear)

这篇关于如何使用summarise_each计算加权平均数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆