在R中查找连续的零序列 [英] Find consecutive sequence of zeros in R

查看:30
本文介绍了在R中查找连续的零序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常大的 data.frame(实际上是一个 data.table).现在,为了简化事情,假设我的 data.frame 如下:

I have a data.frame really big (actually a data.table). Now, to simplify things, let's assume my data.frame is just as follow:

x <- c(1, 1, 0, 0, 1, 0, 0, NA, NA, 0) 
y <- c(1 ,0 ,NA, NA, 0, 0, 0, 1, 1, 0)
mydf <- data.frame(rbind(x,y))

我想确定最后一个序列在哪一行(如果有)由三个连续的零组成,不考虑 NA.因此,在上面的示例中,第一行在最后一个序列中有三个连续的零,但第二个没有.

I'd like to identify in which row (if any) the last sequence is formed by three consecutive zeros, not considering NAs. So, in the example above, the first row has three consecutive zeros in the last sequence, but not the second one.

只要我有一个向量(不是 data.frame),我就知道该怎么做:

I know how to do that if only I have a vector (not a data.frame):

runs <-  rle(x[is.na(x)==F])

runs$lengths[length(runs$lengths)] > 2 & runs$values[length(runs$lengths)]==0

我显然可以做一个循环,我会得到我想要的.但这会非常低效,而且我的实际 data.frame 非常大.那么,关于如何以最快的方式完成的任何想法?

I obviously can do a loop and I'll have what I want. But it'll be incredibly inefficient and my actual data.frame is quite big. So, any ideas on how to do in a fastest way?

我想 apply 可能有用,但我现在无法考虑使用它.另外,也许有一种 data.table 方法可以做到这一点?

I guess apply can be useful, but I'm not able to thinking of using it right now. Also, maybe there is a data.table way of doing this?

ps.:实际上,这个 data.frame 是我原来的 data.table 的改版版本.如果我能以某种方式使用原始格式的 data.frame 完成这项工作,那没关系.要查看我的 data.frame 最初如何,只需将其视为:

ps.: Actually, this data.frame is a reshaped version of my original data.table. If somehow I can do the job with the data.frame in the original format, it's ok. To see how is my data.frame originally, just think of it as:

x <- c(1, 1, 0, 0, 1, 0, 0, 0) 
y <- c(1 ,0 , 0, 0, 0, 1, 1, 0)

myOriginalDf <- data.frame(value=c(x,y), id=rep(c('x','y'), c(length(x), length(y))))

推荐答案

使用 data.table,正如您的问题所暗示的,您实际上想要这样做,据我所知,这就是您所做的想要

Using data.table, as your question suggests you actually want to, as far I a can see, this is doing what you want

DT <- data.table(myOriginalDf)

# add the original order, so you can't lose it
DT[, orig := .I]

# rle by id, saving the length as a new variables

DT[, rleLength := {rr <- rle(value); rep(rr$length, rr$length)}, by = 'id']

# key by value and length to subset 

setkey(DT, value, rleLength)

# which rows are value = 0 and length > 2

DT[list(0, unique(rleLength[rleLength>2])),nomatch=0]

##    value rleLength id orig
## 1:     0         3  x    6
## 2:     0         3  x    7
## 3:     0         3  x    8
## 4:     0         4  y   10
## 5:     0         4  y   11
## 6:     0         4  y   12
## 7:     0         4  y   13

这篇关于在R中查找连续的零序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆