R按时间序列在信号后几天提取 [英] R Extracting following days after signal in time series
问题描述
在我的示例中,我有一个包含3列的数据框:日期,信号和值。现在,我要突变一个新列,该列取决于信号。
In my example I have a data frame with 3 columns: date, signal and value. Now I want to mutate a new columns, which are conditioned on the signals.
如果前一天有信号( ifelse(lag(signal)== 1
),则接下来的两天给我(否则= NA
)。但是在这种情况下,我有三个不同的信号(1,2,3)
。
If there is a signal on a previous day (ifelse(lag(signal) == 1
), then give me the next two following days (else = NA
). But in this case I have three different signals (1,2,3)
.
使用此代码,我只能在第二天的第一天收到信号1,但我也想在第二天得到信号。计算不同信号的多个列(可能与第二天的信号相交)。
With this code I get only the first following day for signal 1. But I also want to have the second following day. And I want to calculate multiple columns for the different signals (maybe with crossing the number of following days with the signals).
df %>% mutate(calculation = ifelse(lag(signal) == 1,
value,
NA))
这是我的示例数据:
library(tidyverse)
library(lubridate)
set.seed(123)
df <- tibble(date = today()+0:10,
signal = c(0,1,0,0,2,0,0,3,0,0,0),
value = sample.int(n=11))
# A tibble: 11 x 3
date signal value
<date> <dbl> <int>
1 2019-07-23 0 3
2 2019-07-24 1 11
3 2019-07-25 0 2
4 2019-07-26 0 6
5 2019-07-27 2 10
6 2019-07-28 0 5
7 2019-07-29 0 4
8 2019-07-30 3 9
9 2019-07-31 0 8
10 2019-08-01 0 1
11 2019-08-02 0 7
这是我想要的输出:
# A tibble: 11 x 3
date signal value new_col_day1_sig_1 new_col_day2_sig_1 new_col_day1_sig_2
<date> <dbl> <int>
1 2019-07-23 0 3 NA NA NA
2 2019-07-24 1 11 NA NA NA
3 2019-07-25 0 2 2 2 NA
4 2019-07-26 0 6 NA 6 NA
5 2019-07-27 2 10 NA NA NA
6 2019-07-28 0 5 NA NA 5
7 2019-07-29 0 4 NA NA NA
8 2019-07-30 3 9 NA NA NA
9 2019-07-31 0 8 NA NA NA
10 2019-08-01 0 1 NA NA NA
11 2019-08-02 0 7 NA NA NA
....and so on...(the next colmns should be new_col_day2_sig_2, new_col_day1_sig_3, new_col_day2_sig_3)
我希望有一个动态的解决方案,因为我不仅希望接下来的两天,而且希望连续七天。解决方案应该考虑不同的信号(1,2,3)
。
I would like to have a dynamic solution, because I would like to have not only the following two days, but up to seven consecutive days. And the solution shgould regard the different signals (1,2,3)
.
解决方案也应该有效
您能帮我解决我的问题吗?
Can you help me to solve my problem?
推荐答案
df %>%
mutate(calculation=ifelse( (lag(signal, 2) == 1) | (lag(signal) == 1), value, NA))
当然这还不够好,因为您想要一个可扩展的解决方案。让我们更加努力:
This is of course not good enough, since you want to have an extensible solution. Let us try harder:
anylag <- function(x, n) {
l <- lapply(1:n, function(i) lag(x, i) == 1)
Reduce("|", l)
}
df %>% mutate(calculation=ifelse(anylag(signal, 3), value, NA))
Result:
# A tibble: 11 x 4
date signal value calculation
<date> <dbl> <int> <int>
1 2019-07-19 0 4 NA
2 2019-07-20 1 8 NA
3 2019-07-21 0 11 11
4 2019-07-22 0 10 10
5 2019-07-23 0 7 7
6 2019-07-24 0 1 NA
7 2019-07-25 1 3 NA
8 2019-07-26 0 9 9
9 2019-07-27 0 2 2
10 2019-07-28 0 6 6
11 2019-07-29 0 5 NA
注意。您的信号
的类型为 double
。您应该从不使用 ==
或%in%
比较双精度数,因为浮点精度有限。要么将其转换为整数,要么使用 all_equal()
。考虑一下:
Note. Your signal
is of type double
. You should never use ==
or %in%
to compare doubles, because of the limited floating point precision. Either convert it to integer or use all_equal()
. Consider this:
> 3*.1 / 3 * 10
[1] 1
> 3*.1 / 3 * 10 == 1
[1] FALSE
> all.equal(3*.1 / 3 * 10, 1)
[1] TRUE
这篇关于R按时间序列在信号后几天提取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!