按组进行汇总和百分比计算 [英] Aggregation and percentage calculation by groups

查看:89
本文介绍了按组进行汇总和百分比计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 R 中有一个按班级划分的学生每周津贴数据集,如下所示:

I have a dataset in R of student weekly allowances by class, which looks like:

Year    ID  Class       Allowance
2013    123 Freshman    100
2013    234 Freshman    110
2013    345 Sophomore   150
2013    456 Sophomore   200
2013    567 Junior      250
2014    678 Junior      100
2014    789 Junior      230
2014    890 Freshman    110
2014    891 Freshman    250
2014    892 Sophomore   220

我如何按组(年/班级)总结结果以获得总和和百分比(按组)?使用 ddply 获得总和似乎很容易,只是无法正确获得 % by group 部分.

How can I summarize the results by group (Year/Class) to get sum and % (by group)? Getting sum seems easy with ddply by just couldn't get the % by group part right.

它适用于sum:

summary <- ddply(my_data, .(Year, Class), summarize, Sum_Allow=sum(Allowance))

但它不适用于分组部分的百分比:

But it doesn't work for the percentage by group part:

summary <- ddply(my_data, .(Year, Class), summarize, Sum_Allow=sum(Allowance),
                 Allow_Pct=Allowance/sum(Allowance))

理想的结果应该是这样的:

Ideal result should look like:

 Year     Class Sum_Allow Allow_Pct
 2013  Freshman       210       26%
 2013    Junior       250       31%
 2013 Sophomore       350       43%
 2014  Freshman       360       40%
 2014    Junior       330       36%
 2014 Sophomore       220       24%

我从 plyr 包中尝试了 ddply,但请告诉我任何可行的方法.

I tried ddply from the plyr package, but please let me know of any way that this may work.

推荐答案

您可以分两步完成

my_data <- read.table(header = TRUE,
                      text = "Year    ID  Class       Allowance
2013    123 Freshman    100
2013    234 Freshman    110
2013    345 Sophomore   150
2013    456 Sophomore   200
2013    567 Junior      250
2014    678 Junior      100
2014    789 Junior      230
2014    890 Freshman    110
2014    891 Freshman    250
2014    892 Sophomore   220")

library(plyr)
(summ <- ddply(my_data, .(Year, Class), summarize, Sum_Allow=sum(Allowance)))

#   Year     Class Sum_Allow
# 1 2013  Freshman       210
# 2 2013    Junior       250
# 3 2013 Sophomore       350
# 4 2014  Freshman       360
# 5 2014    Junior       330
# 6 2014 Sophomore       220

ddply(summ, .(Year), mutate, Allow_pct = Sum_Allow / sum(Sum_Allow) * 100)

#   Year     Class Sum_Allow Allow_pct
# 1 2013  Freshman       210  25.92593
# 2 2013    Junior       250  30.86420
# 3 2013 Sophomore       350  43.20988
# 4 2014  Freshman       360  39.56044
# 5 2014    Junior       330  36.26374
# 6 2014 Sophomore       220  24.17582

我不知道你们其他人是否会发生这种情况,但是当我运行最初的尝试时,R 崩溃而不是发出警告.或者,如果我拼错了 Allow 而不是 allow,它就会崩溃.我真的很讨厌那样;哈德利请修复

I don't know if it happens for the rest of you, but when I run the original attempt, R crashes rather than throwing a warning. Or if I misspell Allow instead of allow, it crashes. I really hate that; hadley pls fix

永远的基础 r

这篇关于按组进行汇总和百分比计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆