在R中查找零的连续序列 [英] Find consecutive sequence of zeros in R

查看:111
本文介绍了在R中查找零的连续序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个data.frame真的很大(实际上是一个data.table)。现在,为了简化操作,让我们假设我的data.frame如下:

I have a data.frame really big (actually a data.table). Now, to simplify things, let's assume my data.frame is just as follow:

x <- c(1, 1, 0, 0, 1, 0, 0, NA, NA, 0) 
y <- c(1 ,0 ,NA, NA, 0, 0, 0, 1, 1, 0)
mydf <- data.frame(rbind(x,y))

在其中的行(如果有的话)中,最后的序列由三个连续的零形成,而不考虑NA。因此,在上面的例子中,第一行在最后一个序列中有三个连续的零,但不是第二个。

I'd like to identify in which row (if any) the last sequence is formed by three consecutive zeros, not considering NAs. So, in the example above, the first row has three consecutive zeros in the last sequence, but not the second one.

我知道如何做一个向量(不是data.frame):

I know how to do that if only I have a vector (not a data.frame):

runs <-  rle(x[is.na(x)==F])

runs$lengths[length(runs$lengths)] > 2 & runs$values[length(runs$lengths)]==0



我显然可以做一个循环我会有我想要的。但是这将是令人难以置信的低效率,我的实际data.frame是相当大。所以,任何想法如何以最快的方式做?

I obviously can do a loop and I'll have what I want. But it'll be incredibly inefficient and my actual data.frame is quite big. So, any ideas on how to do in a fastest way?

我想应用可能是有用的,但我不能够考虑使用它现在。还有,也许有一个data.table方法这样做?

I guess apply can be useful, but I'm not able to thinking of using it right now. Also, maybe there is a data.table way of doing this?

ps:实际上,这个data.frame是我的原始data.table的重塑版本。如果不知何故我可以做的工作与data.frame在原始格式,这是确定。要看看我的data.frame最初是什么,只是认为它:

ps.: Actually, this data.frame is a reshaped version of my original data.table. If somehow I can do the job with the data.frame in the original format, it's ok. To see how is my data.frame originally, just think of it as:

x <- c(1, 1, 0, 0, 1, 0, 0, 0) 
y <- c(1 ,0 , 0, 0, 0, 1, 1, 0)

myOriginalDf <- data.frame(value=c(x,y), id=rep(c('x','y'), c(length(x), length(y))))


推荐答案

使用 data.table 问题表明你真的想要,只要我可以看到,这是做你想要的

Using data.table, as your question suggests you actually want to, as far I a can see, this is doing what you want

DT <- data.table(myOriginalDf)

# add the original order, so you can't lose it
DT[, orig := .I]

# rle by id, saving the length as a new variables

DT[, rleLength := {rr <- rle(value); rep(rr$length, rr$length)}, by = 'id']

# key by value and length to subset 

setkey(DT, value, rleLength)

# which rows are value = 0 and length > 2

DT[list(0, unique(rleLength[rleLength>2])),nomatch=0]

##    value rleLength id orig
## 1:     0         3  x    6
## 2:     0         3  x    7
## 3:     0         3  x    8
## 4:     0         4  y   10
## 5:     0         4  y   11
## 6:     0         4  y   12
## 7:     0         4  y   13

这篇关于在R中查找零的连续序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆