查找运行的起点和终点/指数/连续值 [英] Find start and end positions/indices of runs/consecutive values

查看:109
本文介绍了查找运行的起点和终点/指数/连续值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题:给定一个原子向量,在向量中找到运行的开始和结束索引.

Problem: Given an atomic vector, find the start and end indices of runs in the vector.

具有运行的矢量示例:

x = rev(rep(6:10, 1:5))
# [1] 10 10 10 10 10  9  9  9  9  8  8  8  7  7  6

rle()的输出:

rle(x)
# Run Length Encoding
#  lengths: int [1:5] 5 4 3 2 1
#  values : int [1:5] 10 9 8 7 6

所需的输出:

#   start end
# 1     1   5
# 2     6   9
# 3    10  12
# 4    13  14
# 5    15  15

基类rle似乎没有提供此功能,但类

The base rle class doesn't appear to provide this functionality, but the class Rle and function rle2 do. However, given how minor the functionality is, sticking to base R seems more sensible than installing and loading additional packages.

有一些代码段示例(此处 SO上),它解决了查找起点和终点的稍微不同的问题满足某些条件的运行的结束索引.我想要一种更通用的东西,可以在一行中执行,并且不涉及临时变量或值的分配.

There are examples of code snippets (here, here and on SO) which solve the slightly different problem of finding start and end indices for runs which satisfy some condition. I wanted something that would be more general, could be performed in one line, and didn't involve the assignment of temporary variables or values.

回答我自己的问题,因为我对缺少搜索结果感到沮丧.我希望这对某人有帮助!

Answering my own question because I was frustrated by the lack of search results. I hope this helps somebody!

推荐答案

核心逻辑:

# Example vector and rle object
x = rev(rep(6:10, 1:5))
rle_x = rle(x)

# Compute endpoints of run
end = cumsum(rle_x$lengths)
start = c(1, lag(end)[-1] + 1)

# Display results
data.frame(start, end)
#   start end
# 1     1   5
# 2     6   9
# 3    10  12
# 4    13  14
# 5    15  15

Tidyverse/dplyr方式(以数据帧为中心):

Tidyverse/dplyr way (data frame-centric):

library(dplyr)

rle(x) %>%
  unclass() %>%
  as.data.frame() %>%
  mutate(end = cumsum(lengths),
         start = c(1, dplyr::lag(end)[-1] + 1)) %>%
  magrittr::extract(c(1,2,4,3)) # To re-order start before end for display

由于startend向量的长度与rle对象的values分量的长度相同,因此解决为满足某些条件的运行确定端点的相关问题很简单:filter或子集startend向量使用运行值上的条件.

Because the start and end vectors are the same length as the values component of the rle object, solving the related problem of identifying endpoints for runs meeting some condition is straightforward: filter or subset the start and end vectors using the condition on the run values.

这篇关于查找运行的起点和终点/指数/连续值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆