按子组汇总数据 [英] Summarizing data by subgroups

查看:117
本文介绍了按子组汇总数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据集如下

Org_ID      Market volume  Indicator variable
1                100              1

1                200              0

1                300              0

2                 50              1

2                500              1

3                400              0

3                200              0

3                300              0

3                100              0

我想通过市场TRx和org_id通过按交易量来计算0 指标变量的百分比来对其进行总结,如下所示:

And i want to summarize it by market TRx and org_id by calculating the % of 0 indicator variables in terms of market volume, as follows:

Org_ID   % of 0's by market volume
1   83.3%

2   0%

3   100%

我尝试了分组,但似乎无法做到这一点.谁能建议我可以做些什么?

I tried subgroups but can't seem to be able to do this. Can anyone suggest what are some of the ways i can do?

推荐答案

dplyr:

library(dplyr)

df %>%
  group_by(Org_ID) %>%
  summarize(sum_market_vol = sum(Market_volume*!Indicator_variable),
            tot_market_vol = sum(Market_volume)) %>%
  transmute(Org_ID, Perc_Market_Vol = 100*sum_market_vol/tot_market_vol)

结果:

# A tibble: 3 x 2
  Org_ID Perc_Market_Vol
   <int>           <dbl>
1      1        83.33333
2      2         0.00000
3      3       100.00000

数据:

df = structure(list(Org_ID = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L), 
    Market_volume = c(100L, 200L, 300L, 50L, 500L, 400L, 200L, 
    300L, 100L), Indicator_variable = c(1L, 0L, 0L, 1L, 1L, 0L, 
    0L, 0L, 0L)), .Names = c("Org_ID", "Market_volume", "Indicator_variable"
), class = "data.frame", row.names = c(NA, -9L))

这篇关于按子组汇总数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆