子集仅不断增加值到最大值 [英] Subset only continuously increasing values to max value
问题描述
我正在尝试找到一种解决方案,该解决方案允许我通过找到一个不断增加的向量的起点来对数值数据进行子集化,并在最大值处停止.
I am attempting to find a solution that will allow me to subset numeric data by finding the start of a continuously increasing vector, and stop at the max.
一些示例数据:
if(!require(data.table)) {install.packages("data.table"); library(data.table)}
if(!require(zoo)) {install.packages("zoo"); library(zoo)}
if(!require(dplyr)) {install.packages("dplyr"); library(dplyr)}
depth <- c(1.1, 2, 1.6, 1.2, 1.6, 1.2, 1.5, 1.7, 2.1, 3.1, 3.8, 5.2, 6.1, 7.0, 6.9, 6.9, 6.9, 6.0, 4.3, 2.1, 2.0)
temp <- c(17.9, 17.9, 17.8, 17.9, 17.7, 17.9, 17.9, 17.8, 17.7, 17.6, 17.5, 17.3, 17.2, 17.1, 17.0, 16.9, 16.7, 16.9, 17.2, 17.5, 17.9)
testdf <- data.frame(depth = depth, temp = temp)
我尝试了几种解决方案,一种不起作用,另一种有效,但我觉得它在某些情况下可能有局限性.
I have tried a few solutions, one does not work, the other works but I feel it may have limitations in certain situations.
解决方案 1 仅找到 <代码>1:最大代码>.类似的解决方案建议删除任何减少的值,其中 diff
将为负数.这些不是我想要的.
Solution 1 only finds 1:max
. Similar solutions suggest removing any decreasing values, where diff
would be negative. These are not what I want.
setDT(testdf)[, .SD[1:which.max(depth)]]
depth temp
1: 1.1 17.9
2: 2.0 17.9
3: 1.6 17.8
4: 1.2 17.9
5: 1.6 17.7
6: 1.2 17.9
7: 1.5 17.9
8: 1.7 17.8
9: 2.1 17.7
10: 3.1 17.6
11: 3.8 17.5
12: 5.2 17.3
13: 6.1 17.2
14: 7.0 17.1
我正在尝试恢复它:
depth temp
6: 1.2 17.9
7: 1.5 17.9
8: 1.7 17.8
9: 2.1 17.7
10: 3.1 17.6
11: 3.8 17.5
12: 5.2 17.3
13: 6.1 17.2
14: 7.0 17.1
解决方案 2 使用 diff
和一个 rollapply
来任意合并多个行(这里是 n = 10
).在这个特定的使用中,我将额外的一行填充到最大索引,为了得到它,必须将 diff
设置为 0
,否则 rollapply
> 远低于最大值.
Solution 2 uses diff
and a rollapply
to arbitrarily bin a number of rows (n = 10
here). In this specific use, I pad an extra row to the max index, and to get that, have to set diff
to 0
, otherwise the rollapply
stops well below the max.
testdf$diff <- c(diff(testdf$depth), NA) # add diff column and NA to empty cell
testdf <- testdf[1:(which(testdf$depth == max(testdf$depth)) + 1),] # subset to max depth row, plus one
testdf$diff[(which(testdf$depth == max(testdf$depth))) : (which(testdf$depth == max(testdf$depth)) + 1)] <- 0 # set any diff entry after max depth to 0, for rollapply to work
testdf <- testdf %>%
mutate(diff = rollapply(diff, width = 10, min, align = "left", fill = 0, na.rm = TRUE)) %>%
filter(diff >= 0)
返回我想要的:
depth temp diff
1 1.2 17.9 0
2 1.5 17.9 0
3 1.7 17.8 0
4 2.1 17.7 0
5 3.1 17.6 0
6 3.8 17.5 0
7 5.2 17.3 0
8 6.1 17.2 0
9 7.0 17.1 0
10 6.9 17.0 0 # an extra padded row
使用任意窗口,此解决方案可能无法始终有效.似乎理想的解决方案只是找到最大索引,然后上升到最后一个正 diff
值,然后对该范围进行子集化,但我试图找出一种不涉及的方法循环.
This solution may not work all the time, using an arbitrary window. It seems like the ideal solution would just find the max index, then go up to the last positive diff
value, and subset that range, but I'm trying to figure out a way that doesn't involve looping.
while
循环有效,但我试图避免循环.
A while
loop works, but I was trying to avoid a loop.
findmindepth <- function(x) {
maxdi <- NA
mindi <- NA
maxdi <- (which(x$depth == max(x$depth)) - 1)
while(x$diff[maxdi] > 0) {
maxdi = maxdi - 1
}
mindi = maxdi + 1
newx <- x[mindi:(which(x$depth == max(x$depth)) + 1),]
}
推荐答案
您可以使用 run-长度编码与diff
找到所有减少/增加的起点/终点:
You can use run-length encoding with diff
to find all decreasing/increasing start/end points:
which_max <- which.max(testdf$depth)
encoding <- rle(diff(testdf$depth) > 0)
# these contain the start/end indices of all continuously increasing/decreasing subsets
ends <- cumsum(encoding$lengths) + 1L
starts <- ends - encoding$lengths
# filter out the decreasing subsets
starts <- starts[encoding$values]
ends <- ends[encoding$values]
# find the one that contains the maximum
interval <- which(starts <= which_max & ends >= which_max)
out <- testdf[starts[interval]:ends[interval],]
out
depth temp
6 1.2 17.9
7 1.5 17.9
8 1.7 17.8
9 2.1 17.7
10 3.1 17.6
11 3.8 17.5
12 5.2 17.3
13 6.1 17.2
14 7.0 17.1
实际上,如果您只关心包含最大值的子集,你可以做一些更简单的事情:
actually, if you only care about the subset that contains the maximum, you can do something simpler:
which_max <- which.max(testdf$depth)
if (which_max == 1L) {
out <- testdf[1L, , drop = FALSE]
}
else {
subset1 <- testdf$depth[which_max:1L]
len <- which.max(diff(subset1) > 0)
out <- testdf[(which_max - len + 1L):which_max,]
}
这篇关于子集仅不断增加值到最大值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!