带有mutate的变量定义取决于上一行中的值 [英] Variable definition with mutate that depends on its value in the previous row
问题描述
我有以下格式的数据:
time click interaction
1 407 FALSE TRUE
2 408 TRUE TRUE
3 409 FALSE FALSE
4 410 FALSE FALSE
5 411 FALSE FALSE
6 412 FALSE FALSE
7 413 FALSE FALSE
8 414 FALSE FALSE
9 415 FALSE FALSE
10 416 FALSE FALSE
11 417 FALSE FALSE
12 418 FALSE FALSE
13 419 FALSE FALSE
14 420 FALSE FALSE
15 421 FALSE FALSE
16 422 FALSE FALSE
17 423 FALSE FALSE
18 424 FALSE FALSE
19 425 FALSE FALSE
20 426 FALSE FALSE
21 427 FALSE FALSE
22 428 FALSE FALSE
23 429 FALSE FALSE
24 430 FALSE FALSE
25 431 FALSE FALSE
26 432 FALSE FALSE
27 433 FALSE FALSE
28 434 FALSE FALSE
29 435 FALSE TRUE
30 436 FALSE FALSE
它代表用户每秒与应用程序进行交互(单击以及其他交互事件,如键入,滚动等,以及<如果有任何互动,请点击code>互动,否则)。我想计算一个新的变量,该变量在单击后没有交互直到它们开始再次交互之前,没有交互。
It represents how a user interacts with an application every second (clicks, and other interaction events like typing, scrolling, etc., and interaction
is true when there's any interaction, click or otherwise). I'd like to compute a new variable that is true in the span where there's no interaction after clicking until they do start interacting again.
因此,对于这个新变量,如果存在以下情况,我希望它为真:
So for this new variable, I want it to be true if there was:
- 在最后一秒内没有任何点击(点击或否则)在当前秒内,或
- 在最后一秒内单击后无交互,并且当前秒内仍无交互。
我用dplyr尝试过类似的操作:
I tried something like this with dplyr:
activity %>% mutate(
nothing.after.click = (lag(click) == TRUE & interaction == FALSE) |
(lag(nothing.after.click) == TRUE & interaction == FALSE)
)
但不幸的是,它不起作用(它说错误:object'nothing .after.click未找到。我怎样才能做到这一点?如果dplyr无法实现,我会欢迎您使用其他东西。
but unfortunately it doesn't work (it says "Error: object 'nothing.after.click' not found"). How can I do this? If it isn't possible with dplyr, I would welcome the use of something else.
这是我想要的输出:
time click interaction nothing.after.click
1 407 FALSE TRUE FALSE
2 408 TRUE TRUE FALSE
3 409 FALSE FALSE TRUE
4 410 FALSE FALSE TRUE
5 411 FALSE FALSE TRUE
6 412 FALSE FALSE TRUE
7 413 FALSE FALSE TRUE
8 414 FALSE FALSE TRUE
9 415 FALSE FALSE TRUE
10 416 FALSE FALSE TRUE
11 417 FALSE FALSE TRUE
12 418 FALSE FALSE TRUE
13 419 FALSE FALSE TRUE
14 420 FALSE FALSE TRUE
15 421 FALSE FALSE TRUE
16 422 FALSE FALSE TRUE
17 423 FALSE FALSE TRUE
18 424 FALSE FALSE TRUE
19 425 FALSE FALSE TRUE
20 426 FALSE FALSE TRUE
21 427 FALSE FALSE TRUE
22 428 FALSE FALSE TRUE
23 429 FALSE FALSE TRUE
24 430 FALSE FALSE TRUE
25 431 FALSE FALSE TRUE
26 432 FALSE FALSE TRUE
27 433 FALSE FALSE TRUE
28 434 FALSE FALSE TRUE
29 435 FALSE TRUE FALSE
30 436 FALSE FALSE FALSE
最终,目标是在 nothing.after.click
是真的,因此,如果还有其他方法可以考虑这个问题,我也欢迎。
Ultimately, the goal is to filter these rows where nothing.after.click
is true, so if there's another way to think about this problem I'd welcome that too.
推荐答案
您不能在初始定义中引用变量。我们可以做的是多次通过。
You can't reference a variable in it initial definition. What we can do is do it in multiple passes.
当我查看您的病情时:
nothing.after.click = (lag(click) == TRUE & interaction == FALSE) |
(lag(nothing.after.click) == TRUE & interaction == FALSE)
我发现 interaction == FALSE
都可能。因此,如果 interaction
为 TRUE
,则 nothing.after.click
(从这里开始 nac
)肯定是错误的。否则,我不确定,因此将其设置为 NA
。那是我的第一遍:
I see that interaction == FALSE
in both possibilities. So, if interaction
is TRUE
, then nothing.after.click
(from here on out nac
) is definitely FALSE. Otherwise, I'm not sure yet so I'll set it to NA
. That's my first pass:
dat %>% mutate(nac = ifelse(interaction, FALSE, NA))
我们已经处理了 interaction == FALSE
部分,下一遍将是or子句的 lag(click)== TRUE
部分。对于 NA
尚未确定的任何内容,如果 lag(click)
为TRUE则为TRUE,否则为TRUE。我们将保持不变。 ( == TRUE
是多余的,因此我省略了。)
We've taken care of the interaction == FALSE
part, the next pass will be the lag(click) == TRUE
part of your or clause. For anything that is NA
, therefore undecided as yet, it will be TRUE if lag(click)
is TRUE, otherwise we'll leave it untouched. (== TRUE
is redundant, so I left it out.)
dat %>% mutate(nac = ifelse(interaction, FALSE, NA),
nac = ifelse(lag(click) & is.na(nac), TRUE, nac))
最后一遍是 lag(nac)
部分,任何东西仍未定义的设置为先前定义的值。这是 zoo:na.locf
的工作(locf代表上次观察结转):
For the last pass is the lag(nac)
part, anything that is still undefined is set to the previous defined value. This is a job for zoo:na.locf
(locf stands for "last observation carried forward"):
library(zoo)
dat %>% mutate(nac = ifelse(interaction, FALSE, NA),
nac = ifelse(lag(click) & is.na(nac), TRUE, nac),
nac = na.locf(nac))
# time click interaction nac
# 1 407 FALSE TRUE FALSE
# 2 408 TRUE TRUE FALSE
# 3 409 FALSE FALSE TRUE
# 4 410 FALSE FALSE TRUE
# 5 411 FALSE FALSE TRUE
# 6 412 FALSE FALSE TRUE
# 7 413 FALSE FALSE TRUE
# 8 414 FALSE FALSE TRUE
# 9 415 FALSE FALSE TRUE
# 10 416 FALSE FALSE TRUE
# 11 417 FALSE FALSE TRUE
# 12 418 FALSE FALSE TRUE
# 13 419 FALSE FALSE TRUE
# 14 420 FALSE FALSE TRUE
# 15 421 FALSE FALSE TRUE
# 16 422 FALSE FALSE TRUE
# 17 423 FALSE FALSE TRUE
# 18 424 FALSE FALSE TRUE
# 19 425 FALSE FALSE TRUE
# 20 426 FALSE FALSE TRUE
# 21 427 FALSE FALSE TRUE
# 22 428 FALSE FALSE TRUE
# 23 429 FALSE FALSE TRUE
# 24 430 FALSE FALSE TRUE
# 25 431 FALSE FALSE TRUE
# 26 432 FALSE FALSE TRUE
# 27 433 FALSE FALSE TRUE
# 28 434 FALSE FALSE TRUE
# 29 435 FALSE TRUE FALSE
# 30 436 FALSE FALSE FALSE
这篇关于带有mutate的变量定义取决于上一行中的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!