子集仅不断增加值到最大值 [英] Subset only continuously increasing values to max value

查看:32
本文介绍了子集仅不断增加值到最大值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试找到一种解决方案,该解决方案允许我通过找到一个不断增加的向量的起点来对数值数据进行子集化,并在最大值处停止.

I am attempting to find a solution that will allow me to subset numeric data by finding the start of a continuously increasing vector, and stop at the max.

一些示例数据:

if(!require(data.table)) {install.packages("data.table"); library(data.table)}
if(!require(zoo)) {install.packages("zoo"); library(zoo)}
if(!require(dplyr)) {install.packages("dplyr"); library(dplyr)}

depth <- c(1.1, 2, 1.6, 1.2, 1.6, 1.2, 1.5, 1.7, 2.1, 3.1, 3.8, 5.2, 6.1, 7.0, 6.9, 6.9, 6.9, 6.0, 4.3, 2.1, 2.0)
temp <- c(17.9, 17.9, 17.8, 17.9, 17.7, 17.9, 17.9, 17.8, 17.7, 17.6, 17.5, 17.3, 17.2, 17.1, 17.0, 16.9, 16.7, 16.9, 17.2, 17.5, 17.9)
testdf <- data.frame(depth = depth, temp = temp)

我尝试了几种解决方案,一种不起作用,另一种有效,但我觉得它在某些情况下可能有局限性.

I have tried a few solutions, one does not work, the other works but I feel it may have limitations in certain situations.

解决方案 1 仅找到 <代码>1:最大.类似的解决方案建议删除任何减少的值,其中 diff 将为负数.这些不是我想要的.

Solution 1 only finds 1:max. Similar solutions suggest removing any decreasing values, where diff would be negative. These are not what I want.

setDT(testdf)[, .SD[1:which.max(depth)]]
    depth temp
 1:   1.1 17.9
 2:   2.0 17.9
 3:   1.6 17.8
 4:   1.2 17.9
 5:   1.6 17.7
 6:   1.2 17.9
 7:   1.5 17.9
 8:   1.7 17.8
 9:   2.1 17.7
10:   3.1 17.6
11:   3.8 17.5
12:   5.2 17.3
13:   6.1 17.2
14:   7.0 17.1

我正在尝试恢复它:

    depth temp
 6:   1.2 17.9
 7:   1.5 17.9
 8:   1.7 17.8
 9:   2.1 17.7
10:   3.1 17.6
11:   3.8 17.5
12:   5.2 17.3
13:   6.1 17.2
14:   7.0 17.1

解决方案 2 使用 diff 和一个 rollapply 来任意合并多个行(这里是 n = 10).在这个特定的使用中,我将额外的一行填充到最大索引,为了得到它,必须将 diff 设置为 0,否则 rollapply> 远低于最大值.

Solution 2 uses diff and a rollapply to arbitrarily bin a number of rows (n = 10 here). In this specific use, I pad an extra row to the max index, and to get that, have to set diff to 0, otherwise the rollapply stops well below the max.

testdf$diff <- c(diff(testdf$depth), NA) # add diff column and NA to empty cell
testdf <- testdf[1:(which(testdf$depth == max(testdf$depth)) + 1),] # subset to max depth row, plus one
testdf$diff[(which(testdf$depth == max(testdf$depth))) : (which(testdf$depth == max(testdf$depth)) + 1)] <- 0 # set any diff entry after max depth to 0, for rollapply to work

testdf <- testdf %>% 
mutate(diff = rollapply(diff, width = 10, min, align = "left", fill = 0, na.rm = TRUE)) %>% 
filter(diff >= 0)

返回我想要的:

   depth temp diff
1    1.2 17.9    0
2    1.5 17.9    0
3    1.7 17.8    0
4    2.1 17.7    0
5    3.1 17.6    0
6    3.8 17.5    0
7    5.2 17.3    0
8    6.1 17.2    0
9    7.0 17.1    0
10   6.9 17.0    0 # an extra padded row

使用任意窗口,此解决方案可能无法始终有效.似乎理想的解决方案只是找到最大索引,然后上升到最后一个正 diff 值,然后对该范围进行子集化,但我试图找出一种不涉及的方法循环.

This solution may not work all the time, using an arbitrary window. It seems like the ideal solution would just find the max index, then go up to the last positive diff value, and subset that range, but I'm trying to figure out a way that doesn't involve looping.

while 循环有效,但我试图避免循环.

A while loop works, but I was trying to avoid a loop.

findmindepth <- function(x) {
  maxdi <- NA
  mindi <- NA
  maxdi <- (which(x$depth == max(x$depth)) - 1)
  while(x$diff[maxdi] > 0) {
    maxdi = maxdi - 1
  }
  mindi = maxdi + 1
  newx <- x[mindi:(which(x$depth == max(x$depth)) + 1),]
}

推荐答案

您可以使用 run-长度编码diff找到所有减少/增加的起点/终点:

You can use run-length encoding with diff to find all decreasing/increasing start/end points:

which_max <- which.max(testdf$depth)
encoding <- rle(diff(testdf$depth) > 0)

# these contain the start/end indices of all continuously increasing/decreasing subsets
ends <- cumsum(encoding$lengths) + 1L
starts <- ends - encoding$lengths

# filter out the decreasing subsets
starts <- starts[encoding$values]
ends <- ends[encoding$values]

# find the one that contains the maximum
interval <- which(starts <= which_max & ends >= which_max)
out <- testdf[starts[interval]:ends[interval],]
out
   depth temp
6    1.2 17.9
7    1.5 17.9
8    1.7 17.8
9    2.1 17.7
10   3.1 17.6
11   3.8 17.5
12   5.2 17.3
13   6.1 17.2
14   7.0 17.1

实际上,如果您只关心包含最大值的子集,你可以做一些更简单的事情:

actually, if you only care about the subset that contains the maximum, you can do something simpler:

which_max <- which.max(testdf$depth)
if (which_max == 1L) {
  out <- testdf[1L, , drop = FALSE]
}
else {
  subset1 <- testdf$depth[which_max:1L]
  len <- which.max(diff(subset1) > 0)
  out <- testdf[(which_max - len + 1L):which_max,]
}

这篇关于子集仅不断增加值到最大值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆