使用dplyr计算R中的百分比变化 [英] Calculate Percentage Change in R using dplyr

查看:164
本文介绍了使用dplyr计算R中的百分比变化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想通过 Year 计算利润的百分比,这是一个相当简单的任务,但是我不知何故 NA 。我检查了之前问过的相同问题,但我不明白为什么我会得到 NA 。数据如下:

I want to calculate the percentage of Profit by YEAR which is a fairly simple task but somehow I am getting NA. I have checked same questions asked before but I'm not able to understand why I am getting NA. The data is as follows:

> df_vertical_growth
   YEAR                        VERTICAL   Profit pct_change
1  2017                     AGRICULTURE        0         NA
2  2016                     AGRICULTURE  2053358         NA
3  2015                     AGRICULTURE        0         NA
4  2014                     AGRICULTURE  2370747         NA
5  2013                     AGRICULTURE  4066693         NA
6  2017                   COMMUNICATION        0         NA
7  2016                   COMMUNICATION  1680074         NA
8  2015                   COMMUNICATION  1322470         NA
9  2014                   COMMUNICATION  1460133         NA
10 2013                   COMMUNICATION  1529863         NA
11 2017                    CONSTRUCTION        0         NA
12 2016                    CONSTRUCTION        0         NA
13 2015                    CONSTRUCTION        0         NA
14 2014                    CONSTRUCTION  8250149         NA
15 2013                    CONSTRUCTION        0         NA
16 2017                       EDUCATION        0         NA
17 2016                       EDUCATION 12497015         NA
18 2015                       EDUCATION 13437356         NA
19 2014                       EDUCATION 10856685         NA
20 2013                       EDUCATION 13881127         NA
21 2017 FINANCE, INSURANCE, REAL ESTATE        0         NA
22 2016 FINANCE, INSURANCE, REAL ESTATE        0         NA
23 2015 FINANCE, INSURANCE, REAL ESTATE        0         NA
24 2014 FINANCE, INSURANCE, REAL ESTATE        0         NA
25 2013 FINANCE, INSURANCE, REAL ESTATE  5008436         NA
26 2017                      HEALTHCARE        0         NA
27 2016                      HEALTHCARE        0         NA
28 2015                      HEALTHCARE        0         NA
29 2014                      HEALTHCARE  4554364         NA
30 2013                      HEALTHCARE  5078130         NA
31 2017                     HOSPITALITY        0         NA
32 2016                     HOSPITALITY  4445512         NA
33 2015                     HOSPITALITY  5499419         NA
34 2014                     HOSPITALITY  9060639         NA
35 2013                     HOSPITALITY  4391522         NA
36 2017                   MANUFACTURING        0         NA
37 2016                   MANUFACTURING        0         NA
38 2015                   MANUFACTURING        0         NA
39 2014                   MANUFACTURING        0         NA
40 2013                   MANUFACTURING 27466974         NA
41 2017                          MINING        0         NA
42 2016                          MINING  4359251         NA
43 2015                          MINING  4163201         NA
44 2014                          MINING  6272530         NA
45 2013                          MINING  6668191         NA
46 2017                           OTHER        0         NA
47 2016                           OTHER        0         NA
48 2015                           OTHER        0         NA
49 2014                           OTHER  5935199         NA
50 2013                           OTHER  3585969         NA
51 2017                    PUBLIC ADMIN        0         NA
52 2016                    PUBLIC ADMIN        0         NA
53 2015                    PUBLIC ADMIN        0         NA
54 2014                    PUBLIC ADMIN        0         NA
55 2013                    PUBLIC ADMIN        0         NA
56 2017                    RETAIL TRADE        0         NA
57 2016                    RETAIL TRADE        0         NA
58 2015                    RETAIL TRADE        0         NA
59 2014                    RETAIL TRADE        0         NA
60 2013                    RETAIL TRADE        0         NA
61 2017                         SERVICE        0         NA
62 2016                         SERVICE        0         NA
63 2015                         SERVICE        0         NA
64 2014                         SERVICE        0         NA
65 2013                         SERVICE 28018522         NA
66 2017                  TRANSPORTATION        0         NA
67 2016                  TRANSPORTATION        0         NA
68 2015                  TRANSPORTATION        0         NA
69 2014                  TRANSPORTATION        0         NA
70 2013                  TRANSPORTATION  8430244         NA
71 2017                         UTILITY        0         NA
72 2016                         UTILITY  3551989         NA
73 2015                         UTILITY  6535248         NA
74 2014                         UTILITY  3995486         NA
75 2013                         UTILITY  4477617         NA
76 2017                 WHOLESALE TRADE        0         NA
77 2016                 WHOLESALE TRADE  6898041         NA
78 2015                 WHOLESALE TRADE  7120828         NA
79 2014                 WHOLESALE TRADE        0         NA
80 2013                 WHOLESALE TRADE        0         NA

我的代码:

df_vertical_growth %>% group_by(YEAR, VERTICAL) %>% 
     mutate(pct_change = ((Profit/lag(Profit) - 1) * 100))

现在,基于此处提供的答案如何为R中的多个列计算组中的百分比变化?,还尝试执行以下操作:

Now, based on the answers provided here How can I calculate the percentage change within a group for multiple columns in R?, also tried doing the following:

pct <- function(x) {x / lag(x) - 1}
df_vertical_growth %>% group_by(YEAR, VERTICAL) %>% mutate_at(funs=pct,Profit)

Bu t我收到以下错误消息:

But I am getting following error:


check_dot_cols(.vars,.cols)错误:找不到对象'Profit'

Error in check_dot_cols(.vars, .cols) : object 'Profit' not found

有人可以告诉我,我做错了什么吗?

Can someone please tell me, what am I doing wrong? Thanks a lot in advance.

推荐答案

问题在于每个小组都有一个观察对象。每个行业一年。一个观察的滞后是什么?另外,由于年份以降序排列,所以我相信您需要铅。

The problem lies in the fact each group has one observation. One unique year per Vertical. What is the lag of one observation? Additionally since the years go in descending order I trust you need lead.

library(tidyverse)
z %>%
  group_by(VERTICAL) %>% 
  mutate(pct_change = (Profit/lead(Profit) - 1) * 100)
#output
    YEAR VERTICAL       Profit pct_change
   <int> <fctr>          <int>      <dbl>
 1  2017 AGRICULTURE         0    -100   
 2  2016 AGRICULTURE   2053358     Inf   
 3  2015 AGRICULTURE         0    -100   
 4  2014 AGRICULTURE   2370747    - 41.7 
 5  2013 AGRICULTURE   4066693      NA   
 6  2017 COMMUNICATION       0    -100   
 7  2016 COMMUNICATION 1680074      27.0 
 8  2015 COMMUNICATION 1322470    -  9.43
 9  2014 COMMUNICATION 1460133    -  4.56
10  2013 COMMUNICATION 1529863      NA   

此解决方案假定年份按正确的顺序排列,以确保:

This solution assumes the years are arranged in the correct order, to make sure:

z %>%
  group_by(VERTICAL) %>% 
  arrange(YEAR, .by_group = TRUE) %>%
  mutate(pct_change = (Profit/lag(Profit) - 1) * 100)
#output
    YEAR VERTICAL       Profit pct_change
   <int> <fctr>          <int>      <dbl>
 1  2013 AGRICULTURE   4066693      NA   
 2  2014 AGRICULTURE   2370747    - 41.7 
 3  2015 AGRICULTURE         0    -100   
 4  2016 AGRICULTURE   2053358     Inf   
 5  2017 AGRICULTURE         0    -100   
 6  2013 COMMUNICATION 1529863      NA   
 7  2014 COMMUNICATION 1460133    -  4.56
 8  2015 COMMUNICATION 1322470    -  9.43
 9  2016 COMMUNICATION 1680074      27.0 
10  2017 COMMUNICATION       0    -100   

或使用

arrange(desc(YEAR), .by_group = TRUE)

lead

z是:

structure(list(YEAR = c(2017L, 2016L, 2015L, 2014L, 2013L, 2017L, 
2016L, 2015L, 2014L, 2013L, 2017L, 2016L, 2015L, 2014L, 2013L, 
2017L, 2016L, 2015L, 2014L, 2013L, 2017L, 2016L, 2015L, 2014L, 
2013L, 2017L, 2016L, 2015L, 2014L, 2013L, 2017L, 2016L, 2015L, 
2014L, 2013L, 2017L, 2016L, 2015L, 2014L, 2013L, 2017L, 2016L, 
2015L, 2014L, 2013L, 2017L, 2016L, 2015L, 2014L, 2013L, 2017L, 
2016L, 2015L, 2014L, 2013L, 2017L, 2016L, 2015L, 2014L, 2013L
), VERTICAL = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 
6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 9L, 
9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 
11L, 12L, 12L, 12L, 12L, 12L), .Label = c("AGRICULTURE", "COMMUNICATION", 
"CONSTRUCTION", "EDUCATION", "HEALTHCARE", "HOSPITALITY", "MANUFACTURING", 
"MINING", "OTHER", "SERVICE", "TRANSPORTATION", "UTILITY"), class = "factor"), 
    Profit = c(0L, 2053358L, 0L, 2370747L, 4066693L, 0L, 1680074L, 
    1322470L, 1460133L, 1529863L, 0L, 0L, 0L, 8250149L, 0L, 0L, 
    12497015L, 13437356L, 10856685L, 13881127L, 0L, 0L, 0L, 4554364L, 
    5078130L, 0L, 4445512L, 5499419L, 9060639L, 4391522L, 0L, 
    0L, 0L, 0L, 27466974L, 0L, 4359251L, 4163201L, 6272530L, 
    6668191L, 0L, 0L, 0L, 5935199L, 3585969L, 0L, 0L, 0L, 0L, 
    28018522L, 0L, 0L, 0L, 0L, 8430244L, 0L, 3551989L, 6535248L, 
    3995486L, 4477617L)), .Names = c("YEAR", "VERTICAL", "Profit"
), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", 
"10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", 
"26", "27", "28", "29", "30", "31", "32", "33", "34", "35", "36", 
"37", "38", "39", "40", "41", "42", "43", "44", "45", "46", "47", 
"48", "49", "50", "61", "62", "63", "64", "65", "66", "67", "68", 
"69", "70", "71", "72", "73", "74", "75"), class = "data.frame")

这篇关于使用dplyr计算R中的百分比变化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆