按子组汇总数据 [英] Summarizing data by subgroups
本文介绍了按子组汇总数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我的数据集如下
Org_ID Market volume Indicator variable
1 100 1
1 200 0
1 300 0
2 50 1
2 500 1
3 400 0
3 200 0
3 300 0
3 100 0
我想通过市场TRx和org_id通过按交易量来计算0 指标变量的百分比来对其进行总结,如下所示:
And i want to summarize it by market TRx and org_id by calculating the % of 0 indicator variables in terms of market volume, as follows:
Org_ID % of 0's by market volume
1 83.3%
2 0%
3 100%
我尝试了分组,但似乎无法做到这一点.谁能建议我可以做些什么?
I tried subgroups but can't seem to be able to do this. Can anyone suggest what are some of the ways i can do?
推荐答案
和dplyr
:
library(dplyr)
df %>%
group_by(Org_ID) %>%
summarize(sum_market_vol = sum(Market_volume*!Indicator_variable),
tot_market_vol = sum(Market_volume)) %>%
transmute(Org_ID, Perc_Market_Vol = 100*sum_market_vol/tot_market_vol)
结果:
# A tibble: 3 x 2
Org_ID Perc_Market_Vol
<int> <dbl>
1 1 83.33333
2 2 0.00000
3 3 100.00000
数据:
df = structure(list(Org_ID = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L),
Market_volume = c(100L, 200L, 300L, 50L, 500L, 400L, 200L,
300L, 100L), Indicator_variable = c(1L, 0L, 0L, 1L, 1L, 0L,
0L, 0L, 0L)), .Names = c("Org_ID", "Market_volume", "Indicator_variable"
), class = "data.frame", row.names = c(NA, -9L))
这篇关于按子组汇总数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文