带有mutate的变量定义取决于上一行中的值 [英] Variable definition with mutate that depends on its value in the previous row

查看:77
本文介绍了带有mutate的变量定义取决于上一行中的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下格式的数据:

   time click       interaction
1   407 FALSE              TRUE
2   408  TRUE              TRUE
3   409 FALSE             FALSE
4   410 FALSE             FALSE
5   411 FALSE             FALSE
6   412 FALSE             FALSE
7   413 FALSE             FALSE
8   414 FALSE             FALSE
9   415 FALSE             FALSE
10  416 FALSE             FALSE
11  417 FALSE             FALSE
12  418 FALSE             FALSE
13  419 FALSE             FALSE
14  420 FALSE             FALSE
15  421 FALSE             FALSE
16  422 FALSE             FALSE
17  423 FALSE             FALSE
18  424 FALSE             FALSE
19  425 FALSE             FALSE
20  426 FALSE             FALSE
21  427 FALSE             FALSE
22  428 FALSE             FALSE
23  429 FALSE             FALSE
24  430 FALSE             FALSE
25  431 FALSE             FALSE
26  432 FALSE             FALSE
27  433 FALSE             FALSE
28  434 FALSE             FALSE
29  435 FALSE              TRUE
30  436 FALSE             FALSE

它代表用户每秒与应用程序进行交互(单击以及其他交互事件,如键入,滚动等,以及<如果有任何互动,请点击code>互动,否则)。我想计算一个新的变量,该变量在单击后没有交互直到它们开始再次交互之前,没有交互。

It represents how a user interacts with an application every second (clicks, and other interaction events like typing, scrolling, etc., and interaction is true when there's any interaction, click or otherwise). I'd like to compute a new variable that is true in the span where there's no interaction after clicking until they do start interacting again.

因此,对于这个新变量,如果存在以下情况,我希望它为真:

So for this new variable, I want it to be true if there was:


  • 在最后一秒内没有任何点击(点击或否则)在当前秒内,或

  • 在最后一秒内单击后无交互,并且当前秒内仍无交互。

我用dplyr尝试过类似的操作:

I tried something like this with dplyr:

activity %>% mutate(
    nothing.after.click = (lag(click) == TRUE & interaction == FALSE) |
        (lag(nothing.after.click) == TRUE & interaction == FALSE)
)

但不幸的是,它不起作用(它说错误:object'nothing .after.click未找到。我怎样才能做到这一点?如果dplyr无法实现,我会欢迎您使用其他东西。

but unfortunately it doesn't work (it says "Error: object 'nothing.after.click' not found"). How can I do this? If it isn't possible with dplyr, I would welcome the use of something else.

这是我想要的输出:

   time click       interaction nothing.after.click
1   407 FALSE              TRUE               FALSE
2   408  TRUE              TRUE               FALSE
3   409 FALSE             FALSE                TRUE
4   410 FALSE             FALSE                TRUE
5   411 FALSE             FALSE                TRUE
6   412 FALSE             FALSE                TRUE
7   413 FALSE             FALSE                TRUE
8   414 FALSE             FALSE                TRUE
9   415 FALSE             FALSE                TRUE
10  416 FALSE             FALSE                TRUE
11  417 FALSE             FALSE                TRUE
12  418 FALSE             FALSE                TRUE
13  419 FALSE             FALSE                TRUE
14  420 FALSE             FALSE                TRUE
15  421 FALSE             FALSE                TRUE
16  422 FALSE             FALSE                TRUE
17  423 FALSE             FALSE                TRUE
18  424 FALSE             FALSE                TRUE
19  425 FALSE             FALSE                TRUE
20  426 FALSE             FALSE                TRUE
21  427 FALSE             FALSE                TRUE
22  428 FALSE             FALSE                TRUE
23  429 FALSE             FALSE                TRUE
24  430 FALSE             FALSE                TRUE
25  431 FALSE             FALSE                TRUE
26  432 FALSE             FALSE                TRUE
27  433 FALSE             FALSE                TRUE
28  434 FALSE             FALSE                TRUE
29  435 FALSE              TRUE               FALSE
30  436 FALSE             FALSE               FALSE

最终,目标是在 nothing.after.click 是真的,因此,如果还有其他方法可以考虑这个问题,我也欢迎。

Ultimately, the goal is to filter these rows where nothing.after.click is true, so if there's another way to think about this problem I'd welcome that too.

推荐答案

您不能在初始定义中引用变量。我们可以做的是多次通过。

You can't reference a variable in it initial definition. What we can do is do it in multiple passes.

当我查看您的病情时:

nothing.after.click = (lag(click) == TRUE & interaction == FALSE) |
        (lag(nothing.after.click) == TRUE & interaction == FALSE)

我发现 interaction == FALSE 都可能。因此,如果 interaction TRUE ,则 nothing.after.click (从这里开始 nac )肯定是错误的。否则,我不确定,因此将其设置为 NA 。那是我的第一遍:

I see that interaction == FALSE in both possibilities. So, if interaction is TRUE, then nothing.after.click (from here on out nac) is definitely FALSE. Otherwise, I'm not sure yet so I'll set it to NA. That's my first pass:

dat %>% mutate(nac = ifelse(interaction, FALSE, NA))

我们已经处理了 interaction == FALSE 部分,下一遍将是or子句的 lag(click)== TRUE 部分。对于 NA 尚未确定的任何内容,如果 lag(click)为TRUE则为TRUE,否则为TRUE。我们将保持不变。 ( == TRUE 是多余的,因此我省略了。)

We've taken care of the interaction == FALSE part, the next pass will be the lag(click) == TRUE part of your or clause. For anything that is NA, therefore undecided as yet, it will be TRUE if lag(click) is TRUE, otherwise we'll leave it untouched. (== TRUE is redundant, so I left it out.)

dat %>% mutate(nac = ifelse(interaction, FALSE, NA),
               nac = ifelse(lag(click) & is.na(nac), TRUE, nac))

最后一遍是 lag(nac)部分,任何东西仍未定义的设置为先前定义的值。这是 zoo:na.locf 的工作(locf代表上次观察结转):

For the last pass is the lag(nac) part, anything that is still undefined is set to the previous defined value. This is a job for zoo:na.locf (locf stands for "last observation carried forward"):

library(zoo)
dat %>% mutate(nac = ifelse(interaction, FALSE, NA),
               nac = ifelse(lag(click) & is.na(nac), TRUE, nac),
               nac = na.locf(nac))

#    time click interaction   nac
# 1   407 FALSE        TRUE FALSE
# 2   408  TRUE        TRUE FALSE
# 3   409 FALSE       FALSE  TRUE
# 4   410 FALSE       FALSE  TRUE
# 5   411 FALSE       FALSE  TRUE
# 6   412 FALSE       FALSE  TRUE
# 7   413 FALSE       FALSE  TRUE
# 8   414 FALSE       FALSE  TRUE
# 9   415 FALSE       FALSE  TRUE
# 10  416 FALSE       FALSE  TRUE
# 11  417 FALSE       FALSE  TRUE
# 12  418 FALSE       FALSE  TRUE
# 13  419 FALSE       FALSE  TRUE
# 14  420 FALSE       FALSE  TRUE
# 15  421 FALSE       FALSE  TRUE
# 16  422 FALSE       FALSE  TRUE
# 17  423 FALSE       FALSE  TRUE
# 18  424 FALSE       FALSE  TRUE
# 19  425 FALSE       FALSE  TRUE
# 20  426 FALSE       FALSE  TRUE
# 21  427 FALSE       FALSE  TRUE
# 22  428 FALSE       FALSE  TRUE
# 23  429 FALSE       FALSE  TRUE
# 24  430 FALSE       FALSE  TRUE
# 25  431 FALSE       FALSE  TRUE
# 26  432 FALSE       FALSE  TRUE
# 27  433 FALSE       FALSE  TRUE
# 28  434 FALSE       FALSE  TRUE
# 29  435 FALSE        TRUE FALSE
# 30  436 FALSE       FALSE FALSE

这篇关于带有mutate的变量定义取决于上一行中的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆