R中的cumsum问题 [英] Issues with cumsum in R

查看:132
本文介绍了R中的cumsum问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是示例数据和程序包.我正在使用的代码如下.它适用于前四行,但之后出现问题.期望的结果在最底端.我只需要查看区域,期间组合... 001和2020q1.在这种情况下,将有4个分组(001/2020q1、003/2020q1、001/2020q2、003/2020q2).我将如何进行这样的过程?我有一种感觉,就是我在group by子句中缺少了某些内容,但到目前为止还是会绕圈走.

So here is the sample data and the packages. The code that I am using is below. It works for the first four rows but after that things go awry. The desired result is at the very bottom. I need the cumsum to only look at the area, period combination... 001 and 2020q1. In this case, there will be 4 groupings (001/2020q1, 003/2020q1, 001/2020q2, 003/2020q2). How would I go about doing such a process? I have a feeling that I am missing something in the group by clause but going in circles as of yet.

这是上一个问题的延续.这包含更多数据,并且涉及更多.

This is a continuation of a previous question. This has more data and is a bit more involved.

 library(readxl)
 library(dplyr)
 library(data.table)
 library(odbc)
 library(DBI)
 library(stringr)

employment <- c(1,45,125,130,165,260,2,46,127,132,167,265,50,61,110,121,170,305,55,66,112,123,172,310)
small <- c(1,1,2,2,3,4,1,1,2,2,3,4,1,1,2,2,3,4,1,1,2,2,3,4)
area <-c(001,001,001,001,001,001,001,001,001,001,001,001,003,003,003,003,003,003,003,003,003,003,003,003)
year<-c(2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020)
qtr <-c(1,1,1,1,1,1,2,2,2,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2)

smbtest <- data.frame(employment,small,area,year,qtr)


 smbsummary2<-smbtest %>% 
 mutate(period = paste0(year,"q",qtr)) %>%
 select(area,period,employment,small) %>%
 group_by(area,period,small) %>%
 summarise(employment = sum(employment), worksites = n(), 
        .groups = 'drop') %>% 
 mutate(employment = cumsum(employment),
     worksites = cumsum(worksites))


area    period     small    employment    worksites
 001     2020q1     1          46            2
 001     2020q1     2          303           4
 001     2020q1     3          466           5
 001     2020q1     4          726           6
 003     2020q1     1          48            2
 003     2020q1     2          307           4
 003     2020q1     3          474           5
 003     2020q1     4          739           6
 001     2020q2     1          111           2
 001     2020q2     2          342           4
 001     2020q2     3          512           5
 001     2020q1     4          817           6
 and so on. 

推荐答案

.groups ='drop'删除所有组,而我们需要 .groups ='drop_last'.根据显示的预期输出,应该删除小"列.默认情况下, summary 执行 .groups ='drop_last ,如果我们要指定它来删除警告,则可以这样做

The .groups = 'drop' removes all the groups, instead we need .groups = 'drop_last'. Based on the expected output showed, it should be the 'small' columns that should be dropped. By default, the summarise does the .groups = 'drop_last and if we want to specify it to remove the warnings, it can be done

smbsummary2 <- smbtest %>% 
 mutate(period = paste0(year,"q",qtr)) %>%
 select(area,period,employment,small) %>%
 group_by(area,period,small) %>%
 summarise(employment = sum(employment), worksites = n(), 
        .groups = 'drop_last') %>%  mutate(employment = cumsum(employment),
     worksites = cumsum(worksites))

这篇关于R中的cumsum问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆