如何在 R 中使用 tidyr group_by 函数添加额外的列? [英] How to add additional columns using tidyr group_by function in R?

查看:20
本文介绍了如何在 R 中使用 tidyr group_by 函数添加额外的列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题是我在 这个答案.

This question is a follow up to my post from this answer.

数据

df1 <- structure(list(Date = c("6/24/2020", "6/24/2020", "6/24/2020", 
"6/24/2020", "6/25/2020", "6/25/2020"), Market = c("A", "A", 
"A", "A", "A", "A"), Salesman = c("MF", "RP", "RP", "FR", "MF", 
"MF"), Product = c("Apple", "Apple", "Banana", "Orange", "Apple", 
"Banana"), Quantity = c(20L, 15L, 20L, 20L, 10L, 15L), Price = c(1L, 
1L, 2L, 3L, 1L, 1L), Cost = c(0.5, 0.5, 0.5, 0.5, 0.6, 0.6)), 
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))

解决方案

library(dplyr) # 1.0.0
library(tidyr)
df1 %>%
    group_by(Date, Market) %>% 
    group_by(Revenue = c(Quantity %*% Price), 
             TotalCost = c(Quantity %*% Cost),
             Product, .add = TRUE) %>% 
    summarise(Sold = sum(Quantity)) %>% 
    pivot_wider(names_from = Product, values_from = Sold)
# A tibble: 2 x 7
# Groups:   Date, Market, Revenue, TotalCost [2]
#  Date      Market Revenue TotalCost Apple Banana Orange
#  <chr>     <chr>    <dbl>     <dbl> <int>  <int>  <int>
#1 6/24/2020 A          135      37.5    35     20     20
#2 6/25/2020 A           25      15      10     15     NA

@akrun 的解决方案效果很好.现在我想知道如何将销售人员销售数量的三列添加到现有结果中,以便最终输出如下所示:

@akrun's solution works well. Now I'd like to know how to add three more columns for quantity sold by salesmen to the existing results so the final output will look like this:

Date        Market  Revenue Total Cost  Apples Sold Bananas Sold    Oranges Sold    MF  RP  FR
6/24/2020   A       135     37.5        35          20              20              20  35  20
6/25/2020   A       25      15          15          25              NA              25  NA  NA

推荐答案

一种选择是单独进行分组操作,因为这些操作是在单独的列上完成的,然后按公共列进行连接,即 'Date', '市场'

One option would be to do the group by operations separately as these are done on separate columns and then do a join by the common columns i.e. 'Date', 'Market'

library(dplyr)
library(tidyr)
out1 <- df1 %>%
           group_by(Date, Market) %>% 
           group_by(Revenue = c(Quantity %*% Price), 
                    TotalCost = c(Quantity %*% Cost),
                     Product, .add = TRUE) %>% 
          summarise(Sold = sum(Quantity)) %>% 
          pivot_wider(names_from = Product, values_from = Sold)
out2 <- df1 %>% 
          group_by(Date, Market, Salesman) %>% 
          summarise(SalesSold = sum(Quantity)) %>% 
          pivot_wider(names_from = Salesman, values_from = SalesSold)

left_join(out1, out2)
# A tibble: 2 x 10
# Groups:   Date, Market, Revenue, TotalCost [2]
#  Date      Market Revenue TotalCost Apple Banana Orange    FR    MF    RP
#  <chr>     <chr>    <dbl>     <dbl> <int>  <int>  <int> <int> <int> <int>
#1 6/24/2020 A          135      37.5    35     20     20    20    20    35
#2 6/25/2020 A           25      15      10     15     NA    NA    25    NA

这篇关于如何在 R 中使用 tidyr group_by 函数添加额外的列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆