如何对顺序事件时间序列(事件之间有间隔)进行分组以查找事件的持续时间 [英] How to group sequential event time sequences (with breaks between events) to find duration of events

查看:79
本文介绍了如何对顺序事件时间序列(事件之间有间隔)进行分组以查找事件的持续时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在R中有一个数据集,其中包含一系列人员,发生的事件以及发生的指定时间(以秒为单位),从0开始。它看起来像这样:

I have a data set in R with a series of people, events that occur and an assigned time that they occur in seconds, starting from 0. It looks similar to this:

event seconds person
1      0.0    Bob
2     15.0    Bob
3     28.5    Bob
4     32.0    Joe
5     38.0    Joe
6     41.0    Joe
7     42.5    Joe
8     55.0    Anne
9     58.0    Anne

我需要过滤每个名字,这意味着每个人的有序事件不会是连续的。

I need to filter for each name, and that means the ordered events will not be sequential for each person.

此示例(请注意Bob如何不参与事件4-40等):

An example of what this looks like (notice how Bob is not involved in events 4-40, etc.):

event seconds person
1      0.0     Bob
2      15.0    Bob
3      28.5    Bob
41     256.0   Bob
42     261.0   Bob
43     266.0   Bob
44     268.5   Bob
45     272.0   Bob
46     273.0   Bob
49     569.0   Bob
80     570.5   Bob
81     581.0   Bob

顺序的事件和相关的事件以1的增量分开。我想找到相关事件的持续时间,例如事件1-3是一个28.5秒的组。事件41-46是另一个持续17秒的小组。对于人员列中列出的所有名称,这都是必需的。

The events that are sequential and related are separated by an increment of 1. I would like to find the duration of the related events, for example, events 1-3 is a group that would be 28.5 seconds. Events 41-46 is another group that lasts 17 seconds. This would be required for all the names that are listed in the person column.

我尝试使用dplyr过滤名称,然后使用as.matrix查找事件行之间的差异,并确定增量大于1的位置(指示它不是当前事件序列的较长部分)。我还没有找到一种根据此最大值和最小值来确定相关事件持续时间的方法。解决方案虽然不需要涉及此步骤,但它是我能想到的最接近的步骤。

I have tried filtering the names using dplyr and then finding the difference between event rows, using as.matrix, and determining where the increment is greater than 1 (indicating it's no longer part of the current sequence of events). I haven't found a way to assign the max and min based off of this to determine the duration of related events. The solution does not need to involve this step though, but it was the closest I could come.

最终目标是绘制每个人的非连续时间长度,以直观表示每个人在整个数据集中涉及的事件。

The end goal is to plot the non-contiguous time durations for each person to have a visual representation of each person's event involvement for the entire data set.

谢谢。

推荐答案

假设首先我们只有鲍勃的数据框行,称为 bob
我们假设 bob 已被 event 排序,并在增加。

Suppose first we have just Bob's rows of the dataframe, called bob. We will assume bob is already ordered by event, increasing.

与您提到的相同(请参见 diff(event)> 1 ),还可以使用 cumsum 将每个事件分组到其所属事件的运行中:

Along the same lines as you mentioned (looking at diff(event) > 1), you can additionally use cumsum to group each event to the 'run' of events it belongs to:

library(plyr)
bob2 <- mutate(bob, start = c(1, diff(bob$event) > 1), run=cumsum(start))
   event seconds person start run
1      1     0.0    Bob     1   1
2      2    15.0    Bob     0   1
3      3    28.5    Bob     0   1
4     41   256.0    Bob     1   2
5     42   261.0    Bob     0   2
6     43   266.0    Bob     0   2
7     44   268.5    Bob     0   2
8     45   272.0    Bob     0   2
9     46   273.0    Bob     0   2
10    49   569.0    Bob     1   3
11    80   570.5    Bob     1   4
12    81   581.0    Bob     0   4

开始指示是否这将启动一系列顺序事件,而 run 是我们所处的此类事件。

start indicates whether this starts a run of sequential events, and run is which such set of events we are in.

然后您可以找到持续时间:

Then you can just find the duration:

ddply(bob2, .(run), summarize, length=diff(range(seconds)))
  run length
1   1   28.5
2   2   17.0
3   3    0.0
4   4   10.5

现在假设您将原始数据帧与每个人混合在一起,我们可以再次使用 ddply 进行拆分按人:

Now supposing you have your original dataframe with everyone mixed together in it, we can use ddply again to split it up by person:

tmp <- ddply(df, .(person), transform, run=cumsum(c(1, diff(event) != 1)))
ddply(tmp, .(person, run), summarize, length=diff(range(seconds)), start_event=first(event), end_event=last(event))

    person run length start_event end_event
1   Anne   1    3.0           8         9
2    Bob   1   28.5           1         3
3    Bob   2   17.0          41        46
4    Bob   3    0.0          49        49
5    Bob   4   10.5          80        81
6    Joe   1   10.5           4         7

注意:我的 df 是您的bob表到另一张表的rbind表, unique() d(只是为了说明当有多个表时它是有效的每人运行一次)。
可能有一个聪明的方法将两个 ddply 调用结合在一起(或使用 dplyr 我不熟悉的pipe-y语法),但我不知道它是什么。

Note: my df is your bob table rbind-ed to your other table, unique()d (just to show it works when there are more than one run per person). There is probably a clever way to do this that combines the two ddply calls (or uses the dplyr pipe-y syntax that I am not familiar with), but I do not know what it is.

这篇关于如何对顺序事件时间序列(事件之间有间隔)进行分组以查找事件的持续时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆