用dplyr和rle总结连续的故障 [英] Summarize consecutive failures with dplyr and rle
问题描述
我正在尝试构建一个流失模型,其中包括每个客户的连续数量的UX故障,并遇到困难。这是我的简化数据和所需的输出:
I'm trying to build a churn model that includes the maximum consecutive number of UX failures for each customer and having trouble. Here's my simplified data and desired output:
library(dplyr)
df <- data.frame(customerId = c(1,2,2,3,3,3), date = c('2015-01-01','2015-02-01','2015-02-02', '2015-03-01','2015-03-02','2015-03-03'),isFailure = c(0,0,1,0,1,1))
> df
customerId date isFailure
1 1 2015-01-01 0
2 2 2015-02-01 0
3 2 2015-02-02 1
4 3 2015-03-01 0
5 3 2015-03-02 1
6 3 2015-03-03 1
所需结果:
> desired.df
customerId maxConsecutiveFailures
1 1 0
2 2 1
3 3 2
我正在fl fl bit through through through through through b b b b b b b b b b b b b::::::::::::::::::::::::
I'm flailing quite a bit and searching through other rle questions isn't helping me yet - this is what I was "expecting" a solution to resemble:
df %>%
group_by(customerId) %>%
summarise(maxConsecutiveFailures =
max(rle(isFailure[isFailure == 1])$lengths))
推荐答案
我们通过'customerId'分组,并使用 do
在'isFailure'列执行 rle
。对值
( length [values] $提取
长度
c $ c>),并创建一个如果/ else
条件的'Max'列返回0,那些没有任何1值。
We group by the 'customerId' and use do
to perform the rle
on 'isFailure' column. Extract the lengths
that are 'TRUE' for values
(lengths[values]
), and create the 'Max' column with an if/else
condition to return 0 for those that didn't have any 1 value.
df %>%
group_by(customerId) %>%
do({tmp <- with(rle(.$isFailure==1), lengths[values])
data.frame(customerId= .$customerId, Max=if(length(tmp)==0) 0
else max(tmp)) }) %>%
slice(1L)
# customerId Max
#1 1 0
#2 2 1
#3 3 2
这篇关于用dplyr和rle总结连续的故障的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!