实现“至少”一个使用R(dplyr)的过滤条件 [英] Implementing "at least" condition in a filter using R (dplyr)

查看:100
本文介绍了实现“至少”一个使用R(dplyr)的过滤条件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题与我以前的帖子有关:
连续超出阈值和R中的其他条件

This question is related to my previous post: Consecutive exceedance above a threshold and additional conditions in R

以下是数据:

 dat <- structure(list(V1 = c(-3.85326, -2.88262, -4.1405, -3.95193, 
-6.68925, -2.04202, -2.47597, -4.91161, -2.5946, -2.82873, 2.68839, 
-4.1287, -4.50296, -0.143476, -1.12174, -0.756168, -1.67556, 
-1.92704, -1.89279, -2.37569, -5.71746, -2.7247, -4.12986, -2.29769, 
-1.52835, -2.63623, -2.31461, 2.32796, 4.14354, 4.47055, -0.557311, 
-0.425266, -2.37455, -5.97684, -5.22391, 0.374004, -0.986549, 
 2.36419, 0.218283, 2.66014, -3.44225, 3.46593, 1.3309, 0.679601, 
 5.42195, 10.6555, 8.34144, 1.64939, -1.64558, -0.754001, -4.77503, 
-6.66197, -4.07188, -1.72996, -1.15338, -8.05588, -6.58208, 1.32375, 
-3.69241, -5.23582, -4.33509, -7.43028, -3.57103, -10.4991, -8.68752, 
-8.98304, -8.96825, -7.99087, -8.25109, -6.48483, -6.09004, -7.05249, 
-4.78267)), class = "data.frame", row.names = c(NA, -73L))

我想要的东西

我想要以获得满足以下修改条件的第一个时间步:

I want to get the FIRST timestep satisfying the following modified conditions:

[1] V1 > 0 at the time step

[2] In the succeeding FOUR time steps (including the timestep in [1]), V1 > 0 in AT LEAST THREE timesteps

[3] Accumulated value of the next FOUR timesteps (including the timestep in [1]) should be greater than 1. 

到目前为止,这里是脚本:

library(dplyr)

newx <- dat %>% as_tibble() %>%
mutate(time = 1: n()) %>%  
filter(V1 > 0, dplyr::lead(V1, 1) > 0, dplyr::lead(V1, 2) > 0, 
(dplyr::lead(V1, 1) + dplyr::lead(V1, 2) + dplyr::lead(V1, 3) + 
dplyr::lead(V1, 4)) > 1)

输出

> newx
# A tibble: 7 x 2
    V1  time
   <dbl> <int>
1  2.33     28
2  2.36     38
3  3.47     42
4  1.33     43
5  0.680    44
6  5.42     45
7 10.7      46

问题

我不知道如何正确实施第二个条件。它应该检查四个时间步中是否有三个>0。连续与否无关紧要。

I dont know how to implement the second condition correctly. It should check whether three out of four timesteps is > 0. It doesnt matter wether consecutive or not.

预期输出

正确答案应该是28。

我会帮助您。

推荐答案

如果我正确理解并且希望第一行满足条件,则可以使用 zoo :: rollsum

If I've understood correctly and you want the first row that meets your conditions you can use zoo::rollsum:

library(zoo)
library(dplyr)

dat %>%
  rownames_to_column() %>%
  filter(V1 > 0 &
           rollsum(V1 > 0, 4, fill = NA, align = "left") >= 3 &
           rollsum(V1, 4, fill = NA, align = "left") > 1) %>%
  slice(1)

  rowname      V1
1      28 2.32796

这篇关于实现“至少”一个使用R(dplyr)的过滤条件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆