R按时间序列在信号后几天提取 [英] R Extracting following days after signal in time series

查看:106
本文介绍了R按时间序列在信号后几天提取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的示例中,我有一个包含3列的数据框:日期,信号和值。现在,我要突变一个新列,该列取决于信号。

In my example I have a data frame with 3 columns: date, signal and value. Now I want to mutate a new columns, which are conditioned on the signals.

如果前一天有信号( ifelse(lag(signal)== 1 ),则接下来的两天给我(否则= NA )。但是在这种情况下,我有三个不同的信号(1,2,3)

If there is a signal on a previous day (ifelse(lag(signal) == 1), then give me the next two following days (else = NA). But in this case I have three different signals (1,2,3).

使用此代码,我只能在第二天的第一天收到信号1,但我也想在第二天得到信号。计算不同信号的多个列(可能与第二天的信号相交)。

With this code I get only the first following day for signal 1. But I also want to have the second following day. And I want to calculate multiple columns for the different signals (maybe with crossing the number of following days with the signals).

df %>% mutate(calculation = ifelse(lag(signal) == 1,
                                   value,
                                   NA))

这是我的示例数据:

library(tidyverse)
library(lubridate)

set.seed(123)

df <- tibble(date   = today()+0:10,
             signal = c(0,1,0,0,2,0,0,3,0,0,0),
             value  = sample.int(n=11))
# A tibble: 11 x 3
   date       signal value
   <date>      <dbl> <int>
 1 2019-07-23      0     3
 2 2019-07-24      1    11
 3 2019-07-25      0     2
 4 2019-07-26      0     6
 5 2019-07-27      2    10
 6 2019-07-28      0     5
 7 2019-07-29      0     4
 8 2019-07-30      3     9
 9 2019-07-31      0     8
10 2019-08-01      0     1
11 2019-08-02      0     7

这是我想要的输出:

# A tibble: 11 x 3
   date       signal value   new_col_day1_sig_1  new_col_day2_sig_1  new_col_day1_sig_2
   <date>      <dbl> <int>
 1 2019-07-23      0     3                 NA                   NA                   NA
 2 2019-07-24      1    11                 NA                   NA                   NA
 3 2019-07-25      0     2                  2                    2                   NA
 4 2019-07-26      0     6                 NA                    6                   NA
 5 2019-07-27      2    10                 NA                   NA                   NA
 6 2019-07-28      0     5                 NA                   NA                    5
 7 2019-07-29      0     4                 NA                   NA                   NA
 8 2019-07-30      3     9                 NA                   NA                   NA
 9 2019-07-31      0     8                 NA                   NA                   NA
10 2019-08-01      0     1                 NA                   NA                   NA
11 2019-08-02      0     7                 NA                   NA                   NA



....and so on...(the next colmns should be new_col_day2_sig_2, new_col_day1_sig_3, new_col_day2_sig_3)

我希望有一个动态的解决方案,因为我不仅希望接下来的两天,而且希望连续七天。解决方案应该考虑不同的信号(1,2,3)

I would like to have a dynamic solution, because I would like to have not only the following two days, but up to seven consecutive days. And the solution shgould regard the different signals (1,2,3).

解决方案也应该有效

您能帮我解决我的问题吗?

Can you help me to solve my problem?

推荐答案

df %>% 
   mutate(calculation=ifelse( (lag(signal, 2) == 1) | (lag(signal) == 1), value, NA))

当然这还不够好,因为您想要一个可扩展的解决方案。让我们更加努力:

This is of course not good enough, since you want to have an extensible solution. Let us try harder:

anylag <- function(x, n) {
  l <- lapply(1:n, function(i) lag(x, i) == 1)
  Reduce("|", l)
}

df %>% mutate(calculation=ifelse(anylag(signal, 3), value, NA))

Result:

# A tibble: 11 x 4
   date       signal value calculation
   <date>      <dbl> <int>       <int>
 1 2019-07-19      0     4          NA
 2 2019-07-20      1     8          NA
 3 2019-07-21      0    11          11
 4 2019-07-22      0    10          10
 5 2019-07-23      0     7           7
 6 2019-07-24      0     1          NA
 7 2019-07-25      1     3          NA
 8 2019-07-26      0     9           9
 9 2019-07-27      0     2           2
10 2019-07-28      0     6           6
11 2019-07-29      0     5          NA

注意。您的信号的类型为 double 。您应该从不使用 == %in%比较双精度数,因为浮点精度有限。要么将其转换为整数,要么使用 all_equal()。考虑一下:

Note. Your signal is of type double. You should never use == or %in% to compare doubles, because of the limited floating point precision. Either convert it to integer or use all_equal(). Consider this:

> 3*.1 / 3 * 10 
[1] 1
> 3*.1 / 3 * 10 == 1
[1] FALSE
> all.equal(3*.1 / 3 * 10, 1)
[1] TRUE

这篇关于R按时间序列在信号后几天提取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆