过滤在 r 第 2 部分中事件“A"的时间范围内发生的事件 [英] Filter for events that occur within a time range of event “A” in r part 2

查看:18
本文介绍了过滤在 r 第 2 部分中事件“A"的时间范围内发生的事件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是对之前提出的问题的跟进(过滤在 r 中事件A"的时间范围内发生的事件).由于原始帖子得到正确回答,我决定开始一个新问题.如果这不合适,请告诉我.

This is a follow up to a question asked previously (Filter for events that occur within a time range of event "A" in r). Since the original post was answered correctly I decided to start a new question. If this is improper let me know.

快速回顾.我有带有第二个值的事件数据.我想过滤在所有 A 事件之前 5 秒出现的所有 B 事件.

Quick recap. I have event data with a second value. I wanted to filter all B events that came 5 seconds prior to all A events.

我遇到的问题是数据被分成多个时间段并且秒数重新开始.我不认为这会是一个问题,因为数据是经过排序的,所以没有在我原来的问题中包含一个句点列,但有一些意想不到的结果.

The issue I've run into is that the data is split into periods and the seconds restart. I didn't think this would be an issue as the data was sorted, so didn't include a periods column in my original question, but there as been some unexpected results.

这是添加了句点列的数据示例.

Here is a sample of data with the addition of a period column.

set.seed(123)
event_df <- tibble(time_sec = c(1:120)) %>% 
  sample_n(100) %>%
  mutate(period = sample(c(1,2,3),
                       size = 100,
                       replace = TRUE),
         event = sample(c("A","B"), 
                        size = 100, 
                        replace = TRUE, 
                        prob = c(0.1,0.9))) %>% 
  select(period, time_sec, event) %>% 
  arrange(period, time_sec)

当使用原来有效的解决方案时...

When using the solution that originally worked...

event_df %>%
  group_by(grp =  lag(cumsum(event == 'A'), default = 0)) %>% 
  filter((last(time_sec) - time_sec) <=5)

...您会注意到它正常工作,除了每个时期的第一个 A 事件获取前一时期的所有 B 事件,而不管时间如何.例如,grp 4 看起来像这样:

... you'll notice that it works correctly except for the first A event of each period grabs all the B events in the prior period regardless of the time. For example, grp 4 looks like this:

~period, ~time_sec, ~event, ~grp
1        111,       "B"    4
1        114,       "B"    4
1        120,       "B"    4
2        79,        "B"    4
2        83,        "A"    4

grp 4 的预期输出为:

Expected output for grp 4 would be:

~period, ~time_sec, ~event, ~grp
2        79,        "B"    4
2        83,        "A"    4

我尝试按周期分组,认为这可以解决问题,虽然它过滤掉了大部分事件,但它仍然采用了上一周期的最后一个事件.

I tried grouping by period thinking this would solve the issue, and while it filtered out most of the events, it still took the last event from the previous period.

event_df %>%
  group_by(period,
           grp =  lag(cumsum(event == 'A'), default = 0)) %>% 
  filter((last(time_sec) - time_sec) <=5)

结果:

~period, ~time_sec, ~event, ~grp
1        120,       "B"    4
2        79,        "B"    4
2        83,        "A"    4

更接近,但仍然抓取上一期的最后一个事件.

Closer, but still grabbing the last event from the previous period.

更新:意识到数字被包括在内,因为它们的时间差异是一个负数.这解决了它,除了有一个没有 A 事件的最终分组.

Update: Realized that the numbers were included because they time diff was a negative number. This solves it except there is a final grouping with no A event.

event_df %>%
  group_by(grp =  lag(cumsum(event == 'A'), default = 0)) %>% 
  filter((last(time_sec) - time_sec) <=5 & (last(time_sec) - time_sec) >= 0 )

推荐答案

自从你将 period 添加到 group_by() 你的 grp 值交叉period 值.因此,如果该时期没有以事件A"结束,它使用事件B"last(time_sec)-time_sec 的值.因此,它始终返回该期间的最终值以及任何其他B"值.5 秒内的事件.一个简单的解决方案(适用于玩具数据,不确定真实数据)是修改 filter() 命令以确保我们获得真正的 last 值(这是 grp 中唯一的事件A":

Since you added period to group_by() your grp values cross period values. So if the period doesn't end in an event "A" it uses an event "B" value for last(time_sec)-time_sec. So it always returns the final value in the period and any other "B" events within 5 seconds of it. A simple solution (works for the toy data, not sure on the real data) is modify the filter() command to make sure we're getting the true last value (which is the only event "A" in the grp):

event_df %>%
  group_by(grp =  lag(cumsum(event == 'A'), default = 0), period) %>% 
  filter((last(time_sec[event=='A']) - time_sec) <=5)

这是有效的,因为如果没有事件A",last(time_sec[event=='A']) 的值为 NAperiod, grp 对中的观察.

This works because the value of last(time_sec[event=='A']) is NA if there are no event "A" observations in the period, grp pair.

这篇关于过滤在 r 第 2 部分中事件“A"的时间范围内发生的事件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆