根据给定变量识别连续序列 [英] Identify consecutive sequences based on a given variable

查看：87 发布时间：2020/10/16 21:32:34 r dataframe

本文介绍了根据给定变量识别连续序列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我确实对此感到困惑。 df1 具有以下变量：

I am literally stuck on this. The df1 has the following variables:

serial =一群人

id1 =来自组（例如 12（序列）1（id1） =组 12人1； 12 2 =组12人2，等等。）

id1 = the person from the group (eg. 12 (serial) 1 (id1) =group 12 person 1; 12 2 = group 12 person 2, etc. )

'Day 'Day。

这些天包括相等数量的观测值（例如95）

The days consist of equal number of observations (eg.95)

        day1 (Monday)  =  day11-day196 
        day2 (Tuesday) = day21-day296     
        day3 (Wednesday) =  day31-day396   
        day4 (Thursday) =  day41-day496   
        day5 (Friday) = day51-day596      
        day6 (Saturday) = day61-day696   
        day7 (Sunday) =  day71-day796

df1的示例

serial id1  Day     day1 day2 day3 day4 day5 day6 day7
12      1   Monday    2    1    2    1    1    3    1
123     1   Tuesday   0    3    0    3    3    0    3
10      1   Wednesday 0    3    3    3    3    3    3

我想确定连续的记录（每日记录之间没有间隔）和记录的总数。

I would like to identify the consecutive records (there is no gap between the daily records) and the total amount of the records.

连续录制的开始日期是 Day变量。例如，连续的记录将是连续的12。记录从星期一开始，并且在一周中有记录（至少有95个变量）。在一周中（7 x 95变量），有11条记录

The starting day for consecutive recordings is the 'Day` variable. For example a consecutive record would be serial 12. Recording started on Monday and there are records (at leas one from 95 variable) during the week. During the week (7 x 95 variable) there were made 11 records

由于第3天和第6天之间存在间隔，因此非连续记录的ID为123。记录从星期二开始，并且在星期三和星期六有一个间隙。

A non-consecutive record would be id 123 as the there is a gap day on day3 and day6. Record started on Tuesday and there is a gap on Wednesday and Saturday.

最后我想记录连续记录的持续时间。

Finally I would like to record the duration of the consecutive recording.

样本输出：

 serial  id1   Duration Occurance        Days
12       1      11        7        day1 day2 day3 day4 day5 day6 day7
123      1      12        0        0
10       1      18        5        day3 day4 day5 day6 day7

样本数据

structure(list(serial = c(12, 123, 10), id1 = c(1, 1, 1), Day = structure(1:3, .Label = c("Monday",
"Tuesday", "Wednesday"), class = "factor"), day1 = c(2, 0, 0),
day2 = c(1, 3, 3), day3 = c(2, 0, 3), day4 = c(1, 3, 3),
day5 = c(1, 3, 3), day6 = c(3, 0, 3), day7 = c(1, 3, 3)), row.names = c(NA,
3L), class = "data.frame")

类似的帖子 R-标识连续的序列

推荐答案

我们可以使用 data.table 中的 rleid 来获取次数正确

We can use rleid from data.table to get the 'Occurance' correct

library(data.table)
wkdays <- c("Monday", "Tuesday", "Wednesday", "Thursday", 
"Friday", "Saturday", "Sunday")

out1 <-  do.call(rbind, Map(function(x, y) {
              i1 <- match(y, wkdays): length(x)
              i2 <- x[i1] != 0
              i3 <- all(i2)
              grp1 <- rleid(i2)
              Days <- if(i3) tapply(names(x)[i1][i2], grp1[i2], FUN = paste, collapse= ' ') else ''
             Occurance <- if(i3) length(grp1[i2]) else 0
             data.frame(Occurance, Days)
            }, asplit(df[-(1:3)], 1), df$Day))

 out1$Duration <- rowSums(df1[startsWith(names(df1), 'day')])
 out1
 # Occurance                               Days Duration
 #1         7 day1 day2 day3 day4 day5 day6 day7       11
 #2         0                                          12
 #3         5           day3 day4 day5 day6 day7       18

这篇关于根据给定变量识别连续序列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

根据给定变量识别连续序列 [英] Identify consecutive sequences based on a given variable

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

根据给定变量识别连续序列 [英] Identify consecutive sequences based on a given variable

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭