查找并替换r中的数字序列 [英] find and replace numeric sequence in r

查看:72
本文介绍了查找并替换r中的数字序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含类似于下面的数字序列的数据框:

I have a dataframe with a sequence of numbers similar to below:

data <- c(1,1,1,0,0,1,1,2,2,2,0,0,0,2,1,1,0,1,0,2)

我需要的是找到 0 的 1、2 或 3 次重复的所有实例,其中继续的数字和后面的数字是相同的——即都是 1 或都是 2(例如 1,0,1 或 2,0,0,2 但不是 2,0,1).

What I need is something to locate all instances of 1, 2 or 3 repetitions of 0 where the proceeding and following numbers are identical- i.e. both 1 or both 2 (for example 1,0,1 or 2,0,0,2 but NOT 2,0,1).

然后我只需要用周围的值填充零.

Then I need to fill the zeros only with the surrounding value.

我已经找到并计算出连续的零

I have managed to locate and count consecutive zeros

consec <- (!data) * unlist(lapply(rle(data)$lengths, seq_len))

然后我找到了这些连续零开头的行:

then I have found the row where these consecutive zeros begin with:

consec <- as.matrix(consec)
first_na <- which(consec==1,arr.ind=TRUE)

但我对更换过程感到困惑

But I'm stumped with the replacement process

非常感谢您的帮助!

卡尔

推荐答案

由于似乎对这个问题的答案很感兴趣,我想我会为后代写一个替代的正则表达式方法.

Since there seems to be a lot of interest in the answer to this question, I thought I would write up an alternative regular expressions method for posterity.

使用gregexpr"函数,您可以搜索模式并使用结果位置匹配和匹配长度来调出原始向量中要更改的值.使用正则表达式的优势在于,我们可以明确指出我们想要匹配的模式,因此,我们无需担心任何排除情况.

Using the 'gregexpr' function, you can search out patterns and use the resulting location matches and match lengths to call out which values to change in the original vector. The advantage of using regular expressions is that we can be explicit about exactly which patterns we want to match, and as a result, we won't have any exclusion cases to worry about.

注意:以下示例按书面形式运行,因为我们假设是个位数的值.我们可以轻松地将其调整为其他模式,但我们可以使用单个字符走捷径.如果我们想用可能的多位数值来做到这一点,我们需要添加一个分隔符作为第一个连接 ('paste') 函数的一部分.

代码

str.values <- paste(data, collapse="") # String representation of vector
str.matches <- gregexpr("1[0]{1,3}1", str.values) # Pattern 101/1001/10001
data[eval(parse(text=paste("c(",paste(str.matches[[1]] + 1, str.matches[[1]] - 2 + attr(str.matches[[1]], "match.length"), sep=":", collapse=","), ")")))] <- 1 # Replace zeros with ones
str.matches <- gregexpr("2[0]{1,3}2", str.values) # Pattern 202/2002/20002
data[eval(parse(text=paste("c(",paste(str.matches[[1]] + 1, str.matches[[1]] - 2 + attr(str.matches[[1]], "match.length"), sep=":", collapse=","), ")")))] <- 2 # Replace zeros with twos

<小时>

第 1 步:将所有数据值组成一个字符串.


Step 1: Make a single string of all the data values.

str.values <- paste(data, collapse="")
# "11100112220002110102"

这会将数据折叠成一个长字符串,因此我们可以对其使用正则表达式.

This collapses down the data into one long string, so we can use a regular expression on it.

第 2 步:应用正则表达式查找字符串中任何匹配项的位置和长度.

Step 2: Apply a regular expression to find the locations and lengths of any matches within the string.

str.matches <- gregexpr("1[0]{1,3}1", str.values)
# [[1]]
# [1]  3 16
# attr(,"match.length")
# [1] 4 3
# attr(,"useBytes")
# [1] TRUE

在这种情况下,我们使用正则表达式来查找第一个模式,一到三个零 ([0]{2,}) 两边各有一个 (1[0]{1,3}1).我们必须匹配整个模式,以防止必须检查两端是否匹配.我们将在下一步中减去这些末端.

In this case, we're using a regular expression to look for the first pattern, one to three zeros ([0]{2,}) with ones on either side (1[0]{1,3}1). We will have to match the entire pattern, in order to prevent having to check for matching ones or twos on the ends. We'll subtract those ends off in the next step.

步骤 3:将 1 写入原始向量中的所有匹配位置.

Step 3: Write ones into all the matching locations in the original vector.

data[eval(parse(text=paste("c(",paste(str.matches[[1]] + 1, str.matches[[1]] - 2 + attr(str.matches[[1]], "match.length"), sep=":", collapse=","), ")")))] <- 1
# 1 1 1 1 1 1 1 2 2 2 0 0 0 2 1 1 1 1 0 2

我们在这里一次性完成几个步骤.首先,我们根据正则表达式中匹配的数字创建一个数字序列列表.在这种情况下,有两个匹配项,它们从索引 3 和 16 开始,长度分别为 4 和 3 项.这意味着我们的零位于索引 (3+1):(3-2+4) 或 4:5 以及 (16+1):(16-2+3) 或 17:17.我们再次使用折叠"选项连接(粘贴")这些序列,以防有多个匹配项.然后,我们使用第二个连接将序列放入组合 (c()) 函数中.使用 'eval' 和 'parse' 函数,我们将此文本转换为代码并将其作为索引值传递给 [data] 数组.我们将所有的都写入这些位置.

We're doing a few steps all at once here. First, we are creating a list of number sequences from the numbers that matched in the regular expression. In this case, there are two matches, which start at indexes 3 and 16 and are 4 and 3 items long, respectively. This means our zeros are located at indexes (3+1):(3-2+4), or 4:5 and at (16+1):(16-2+3), or 17:17. We concatenate ('paste') these sequences using the 'collapse' option again, in case there are multiple matches. Then, we use a second concatenation to put the sequences inside of a combine (c()) function. Using the 'eval' and 'parse' functions, we turn this text into code and pass it as index values to the [data] array. We write all ones into those locations.

步骤 x:对每个模式重复.在这种情况下,我们需要进行第二次搜索,找到一到三个零,两边都是二,然后运行与步骤 3 相同的语句,但分配的是二,而不是一.

Step x: Repeat for each pattern. In this case, we need to do a second search and find one to three zeros with twos on either side and then run the same statement as Step 3, but assigning twos, instead of ones.

str.matches <- gregexpr("2[0]{1,3}2", str.values)
# [[1]]
# [1] 10
# attr(,"match.length")
# [1] 5
# attr(,"useBytes")
# [1] TRUE

data[eval(parse(text=paste("c(",paste(str.matches[[1]] + 1, str.matches[[1]] - 2 + attr(str.matches[[1]], "match.length"), sep=":", collapse=","), ")")))] <- 2
# 1 1 1 1 1 1 1 2 2 2 2 2 2 2 1 1 1 1 0 2

<小时>

更新:我意识到原来的问题说要连续匹配一到三个零,而不是我写到原始代码中的两个或更多".我更新了正则表达式和解释,虽然代码保持不变.


Update: I realized the original problem said to match one to three zeros in a row, rather than the "two or more" that I written into the original code. I have updated the regular expressions and the explanation, although the code remains the same.

这篇关于查找并替换r中的数字序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆