(R)连续计数间隔数 [英] (R) Cumulatively Count Gaps in Sequential Numbers
问题描述
我要解决一个棘手的问题:
I have a tricky problem I'm trying to solve:
我的数据看起来像以下示例:
I have data that looks like the following sample:
UniqueID Month
ABC123 1
ABC123 2
ABC123 3
ABC123 4
ABC123 6
ABC123 7
DEF456 3
DEF456 4
DEF456 10
DEF456 11
DEF456 12
DEF456 14
GHI789 2
GHI789 3
JKL012 12
JKL012 13
JKL012 14
唯一ID每个月都是唯一的.月列指的是特定月份.例如:1 = 2018年10月,2 = 2019年11月,依此类推.我们总共有14个不同月份的数据.我想累计计算我们跳过一个月以及每个UniqueID的最后一个月不是14时的次数.开始的月份不计入计算中.计算结果将得出以下示例:
The UniqueID is unique per month. The month column refers to a particular month. For example: 1=October 2018, 2=November 2019, and so on. We have a total of 14 different months for which we have data. I want to cumulatively count the number of times we skip a month and when the final month per UniqueID is not 14. The starting month does not factor into the calculation. The resulting calculation would result in the following sample:
UniqueID Month CountSkip
ABC123 1 0
ABC123 2 0
ABC123 3 0
ABC123 4 0
ABC123 6 1
ABC123 7 2
DEF456 3 0
DEF456 4 0
DEF456 10 1
DEF456 11 1
DEF456 12 1
DEF456 14 2
GHI789 2 0
GHI789 3 1
JKL012 12 0
JKL012 13 0
JKL012 14 0
我有一个摘要可以通过执行以下操作来计算跳过的总数:
I have a snippet to calculate the total number of skips by doing the following:
data %>%
group_by(UniqueID) %>%
mutate(Skipped = sum(diff(Month) > 1))
如何修改此值以累计跳过次数,并考虑上个月的值不为14?
How could I modify this to cumulatively count the skips and also account for the last month value not being 14?
任何帮助将不胜感激!谢谢!
Any help would be appreciated! Thank you!
推荐答案
这是一种方法.让我知道这是否符合您的逻辑.
Here is one approach. Let me know if this has the logic you are looking for.
library(tidyverse)
data %>%
group_by(UniqueID) %>%
mutate(Skip = if_else(Month - lag(Month, default = first(Month) - 1) - 1 > 0 |
(Month == last(Month) & Month != 14), 1, 0),
CountSkip = cumsum(Skip))
# A tibble: 17 x 4
# Groups: UniqueID, CountSkip [9]
UniqueID Month Skip CountSkip
<chr> <int> <dbl> <dbl>
1 ABC123 1 0 0
2 ABC123 2 0 0
3 ABC123 3 0 0
4 ABC123 4 0 0
5 ABC123 6 1 1
6 ABC123 7 1 2
7 DEF456 3 0 0
8 DEF456 4 0 0
9 DEF456 10 1 1
10 DEF456 11 0 1
11 DEF456 12 0 1
12 DEF456 14 1 2
13 GHI789 2 0 0
14 GHI789 3 1 1
15 JKL012 12 0 0
16 JKL012 13 0 0
17 JKL012 14 0 0
数据(来自@akrun)
Data (from @akrun)
data <- structure(list(UniqueID = c("ABC123", "ABC123", "ABC123", "ABC123",
"ABC123", "ABC123", "DEF456", "DEF456", "DEF456", "DEF456", "DEF456",
"DEF456", "GHI789", "GHI789", "JKL012", "JKL012", "JKL012"),
Month = c(1L, 2L, 3L, 4L, 6L, 7L, 3L, 4L, 10L, 11L, 12L,
14L, 2L, 3L, 12L, 13L, 14L)), class = "data.frame", row.names = c(NA,
-17L))
这篇关于(R)连续计数间隔数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!