(R)连续计数间隔数 [英] (R) Cumulatively Count Gaps in Sequential Numbers

查看:40
本文介绍了(R)连续计数间隔数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要解决一个棘手的问题:

I have a tricky problem I'm trying to solve:

我的数据看起来像以下示例:

I have data that looks like the following sample:

UniqueID  Month  
ABC123    1       
ABC123    2      
ABC123    3      
ABC123    4      
ABC123    6      
ABC123    7      
DEF456    3      
DEF456    4      
DEF456    10     
DEF456    11     
DEF456    12     
DEF456    14     
GHI789    2      
GHI789    3  
JKL012    12     
JKL012    13     
JKL012    14         

唯一ID每个月都是唯一的.月列指的是特定月份.例如:1 = 2018年10月,2 = 2019年11月,依此类推.我们总共有14个不同月份的数据.我想累计计算我们跳过一个月以及每个UniqueID的最后一个月不是14时的次数.开始的月份不计入计算中.计算结果将得出以下示例:

The UniqueID is unique per month. The month column refers to a particular month. For example: 1=October 2018, 2=November 2019, and so on. We have a total of 14 different months for which we have data. I want to cumulatively count the number of times we skip a month and when the final month per UniqueID is not 14. The starting month does not factor into the calculation. The resulting calculation would result in the following sample:

UniqueID  Month  CountSkip
ABC123    1      0  
ABC123    2      0
ABC123    3      0
ABC123    4      0
ABC123    6      1
ABC123    7      2
DEF456    3      0
DEF456    4      0
DEF456    10     1
DEF456    11     1
DEF456    12     1
DEF456    14     2
GHI789    2      0
GHI789    3      1
JKL012    12     0
JKL012    13     0
JKL012    14     0

我有一个摘要可以通过执行以下操作来计算跳过的总数:

I have a snippet to calculate the total number of skips by doing the following:

data %>% 
  group_by(UniqueID) %>%
  mutate(Skipped = sum(diff(Month) > 1))

如何修改此值以累计跳过次数,并考虑上个月的值不为14?

How could I modify this to cumulatively count the skips and also account for the last month value not being 14?

任何帮助将不胜感激!谢谢!

Any help would be appreciated! Thank you!

推荐答案

这是一种方法.让我知道这是否符合您的逻辑.

Here is one approach. Let me know if this has the logic you are looking for.

library(tidyverse)

data %>%
  group_by(UniqueID) %>%
  mutate(Skip = if_else(Month - lag(Month, default = first(Month) - 1) - 1 > 0 | 
                          (Month == last(Month) & Month != 14), 1, 0),
         CountSkip = cumsum(Skip))

# A tibble: 17 x 4
# Groups:   UniqueID, CountSkip [9]
   UniqueID Month  Skip CountSkip
   <chr>    <int> <dbl>     <dbl>
 1 ABC123       1     0         0
 2 ABC123       2     0         0
 3 ABC123       3     0         0
 4 ABC123       4     0         0
 5 ABC123       6     1         1
 6 ABC123       7     1         2
 7 DEF456       3     0         0
 8 DEF456       4     0         0
 9 DEF456      10     1         1
10 DEF456      11     0         1
11 DEF456      12     0         1
12 DEF456      14     1         2
13 GHI789       2     0         0
14 GHI789       3     1         1
15 JKL012      12     0         0
16 JKL012      13     0         0
17 JKL012      14     0         0

数据(来自@akrun)

Data (from @akrun)

data <- structure(list(UniqueID = c("ABC123", "ABC123", "ABC123", "ABC123", 
                                    "ABC123", "ABC123", "DEF456", "DEF456", "DEF456", "DEF456", "DEF456", 
                                    "DEF456", "GHI789", "GHI789", "JKL012", "JKL012", "JKL012"), 
                       Month = c(1L, 2L, 3L, 4L, 6L, 7L, 3L, 4L, 10L, 11L, 12L, 
                                 14L, 2L, 3L, 12L, 13L, 14L)), class = "data.frame", row.names = c(NA, 
                                                                                                   -17L))

这篇关于(R)连续计数间隔数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆