使用R的序列长度编码 [英] Sequence length encoding using R
问题描述
有没有办法在R中编码增加整数序列,类似于使用运行长度编码的编码运行长度( rle
)? p>
我将举例说明:
类比:运行长度编码
r< - c(rep(1,4),2,3,4,rep(5,5))
rle(r)
运行长度编码
长度:int [1:5] 4 1 1 1 5
值:num [1:5] 1 2 3 4 5
期望:序列长度编码
$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ 5 $ 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 7 8 9
某些功能
序列长度
长度:int [1:4] 5 1 1 5
value1:num [1: 4] 1 5 5 5
编辑1
因此, somefunction(1:10)
将给出结果:
序列长度
pre>
长度:int [1:1] 10
value1:num [1:1] 1
此结果意味着有一个长度为10的整数序列,起始值为1,即
seq(1,10)
请注意,我的示例结果没有错误。这个向量实际上是以用于构造它的顺序5:9而不是6:9结束。
我的用例是我正在使用调查数据SPSS导出文件。在一个问题网格中的每一个问题都将有一个名称为
粘贴(q,1:5)
的模式,但有时会出现一个其他类别标记为q_99
,q_other
或其他内容。我想找到一个确定序列的方法。
编辑2
在某种程度上,我所期望的功能是与我的基本函数
序列
的倒数,起始值value1
在我的例如,添加长度< - c(5,1,1,5)
value1 < c(1,5,5,5)
s
[1] 1 2 3 4 5 5 5 5 6 7 8 9
序列(长度)+ rep(value1- 1,长度)
[1] 1 2 3 4 5 5 5 5 6 7 8 9
编辑3
我应该说,为了我的目的,一个序列被定义为增加整数序列与单调递增序列相反,例如
c(4,5,6,7)
而不是c(2,4,6,8)
code> C(5,4,3,2,1)。但是,任何其他整数可以出现在序列之间。
这意味着一个解决方案应该能够应对这个测试用例:
somefunction(c(2,4,1:4,5,5))
序列长度
length:int [1:4] 1 1 5 1
value1:num [1:4] 2 4 1 5
情况下,解决方案也可以应对最初提出的用例,其中包括向量中的字符,例如
somefunction(c (2,4,1,4,5,other))
序列长度
长度:int [1:5] 1 1 5 1 1
value1:num [1:5 ] 2 4 1 5other
解决方案编辑:添加控制做角色向量。
根据rle,我来看下面的解决方案:
somefunction< - function(x){
if(!is.numeric(x))x< - as.numeric(x)
n< ; - length(x)
y< - x [-1L]!= x [-n] + 1L
i <-C(其中(y | is.na(y)),n)
列表(
length = diff(c(0L,i)),
values = x [head(c(0L,i)+ 1L,-1L)]
)
}
> s(c)(2,4,1:4,rep(5,4),6:9,4,4,4)
>一些功能
$ length
[1] 1 1 5 1 1 5 1 1 1
$ values
[1] 2 4 1 5 5 5 4 4 4
这一个适用于我尝试的每个测试用例,并使用没有ifelse子句的向量化值。应该跑得更快它将字符串转换为NA,以便保留数字输出。
> S
>一些功能(S)
$ length
[1] 1 1 5 1 1 1 3 1
$ values
[1] 4 2 1 5 NA NA 4 2
警告信息:
某些功能(S):强制引入的NAs
Is there a way to encode increasing integer sequences in R, analogous to encoding run lengths using run length encoding (
rle
)?I'll illustrate with an example:
Analogy: Run length encoding
r <- c(rep(1, 4), 2, 3, 4, rep(5, 5)) rle(r) Run Length Encoding lengths: int [1:5] 4 1 1 1 5 values : num [1:5] 1 2 3 4 5
Desired: sequence length encoding
s <- c(1:4, rep(5, 4), 6:9) s [1] 1 2 3 4 5 5 5 5 6 7 8 9 somefunction(s) Sequence lengths lengths: int [1:4] 5 1 1 5 value1 : num [1:4] 1 5 5 5
Edit 1
Thus,
somefunction(1:10)
will give the result:Sequence lengths lengths: int [1:1] 10 value1 : num [1:1] 1
This results means that there is an integer sequence of length 10 with starting value of 1, i.e.
seq(1, 10)
Note that there isn't a mistake in my example result. The vector in fact ends in the sequence 5:9, not 6:9 which was used to construct it.
My use case is that I am working with survey data in an SPSS export file. Each subquestion in a grid of questions will have a name of the pattern
paste("q", 1:5)
, but sometimes there is an "other" category which will be markedq_99
,q_other
or something else. I wish to find a way of identifying the sequences.Edit 2
In a way, my desired function is the inverse of the base function
sequence
, with the start value,value1
in my example, added.lengths <- c(5, 1, 1, 5) value1 <- c(1, 5, 5, 5) s [1] 1 2 3 4 5 5 5 5 6 7 8 9 sequence(lengths) + rep(value1-1, lengths) [1] 1 2 3 4 5 5 5 5 6 7 8 9
Edit 3
I should have stated that for my purposes a sequence is defined as increasing integer sequences as opposed to monotonically increasing sequences, e.g.
c(4,5,6,7)
but notc(2,4,6,8)
norc(5,4,3,2,1)
. However, any other integer can appear between sequences.This means a solution should be able to cope with this test case:
somefunction(c(2, 4, 1:4, 5, 5)) Sequence lengths lengths: int [1:4] 1 1 5 1 value1 : num [1:4] 2 4 1 5
In the ideal case, the solution can also cope with the use case suggested originally, which would include characters in the vector, e.g.
somefunction(c(2, 4, 1:4, 5, "other")) Sequence lengths lengths: int [1:5] 1 1 5 1 1 value1 : num [1:5] 2 4 1 5 "other"
解决方案EDIT : added control to do the character vectors as well.
Based on rle, I come to following solution :
somefunction <- function(x){ if(!is.numeric(x)) x <- as.numeric(x) n <- length(x) y <- x[-1L] != x[-n] + 1L i <- c(which(y|is.na(y)),n) list( lengths = diff(c(0L,i)), values = x[head(c(0L,i)+1L,-1L)] ) } > s <- c(2,4,1:4, rep(5, 4), 6:9,4,4,4) > somefunction(s) $lengths [1] 1 1 5 1 1 5 1 1 1 $values [1] 2 4 1 5 5 5 4 4 4
This one works on every test case I tried and uses vectorized values without ifelse clauses. Should run faster. It converts strings to NA, so you keep a numeric output.
> S <- c(4,2,1:5,5, "other" , "other",4:6,2) > somefunction(S) $lengths [1] 1 1 5 1 1 1 3 1 $values [1] 4 2 1 5 NA NA 4 2 Warning message: In somefunction(S) : NAs introduced by coercion
这篇关于使用R的序列长度编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!