使用R的序列长度编码 [英] Sequence length encoding using R

查看：114 发布时间：2017/8/16 19:51:27 r encoding

本文介绍了使用R的序列长度编码的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有没有办法在R中编码增加整数序列，类似于使用运行长度编码的编码运行长度（ rle ）？ p>

我将举例说明：

类比：运行长度编码

  r<  -  c（rep（1，4），2，3，4，rep（5，5））
 rle（r）
运行长度编码
长度：int [1：5] 4 1 1 1 5 
值：num [1：5] 1 2 3 4 5

期望：序列长度编码

$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ 5 $ 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 7 8 9

某些功能
序列长度
长度：int [1：4] 5 1 1 5
value1：num [1： 4] 1 5 5 5

编辑1

因此， somefunction（1:10）将给出结果：

 序列长度
长度：int [1：1] 10 
 value1：num [1：1] 1 
  pre> 
 
 此结果意味着有一个长度为10的整数序列，起始值为1，即 seq（1，10） 
 
 
 请注意，我的示例结果没有错误。这个向量实际上是以用于构造它的顺序5：9而不是6：9结束。
 
 
 我的用例是我正在使用调查数据SPSS导出文件。在一个问题网格中的每一个问题都将有一个名称为粘贴（q，1：5）的模式，但有时会出现一个其他类别标记为 q_99 ， q_other 或其他内容。我想找到一个确定序列的方法。
 
 
  编辑2  
 
 
 在某种程度上，我所期望的功能是与我的基本函数序列的倒数，起始值 value1 在我的例如，添加
 长度<  -  c（5,1,1,5）
 value1 < c（1，5，5，5）
 
s 
 [1] 1 2 3 4 5 5 5 5 6 7 8 9 
序列（长度）+ rep（value1- 1，长度）
 [1] 1 2 3 4 5 5 5 5 6 7 8 9 
  
 编辑3  
 
 
 我应该说，为了我的目的，一个序列被定义为增加整数序列与单调递增序列相反，例如 c（4,5,6,7）而不是 c（2,4,6,8） code> C（5,4,3,2,1）。但是，任何其他整数可以出现在序列之间。
 
 
 这意味着一个解决方案应该能够应对这个测试用例：
  somefunction（c（2，4，1：4,5,5））
序列长度
 length：int [1：4] 1 1 5 1 
 value1：num [1：4] 2 4 1 5 
  
情况下，解决方案也可以应对最初提出的用例，其中包括向量中的字符，例如
  somefunction（c （2，4，1，4，5，other））
序列长度
长度：int [1：5] 1 1 5 1 1 
 value1：num [1：5 ] 2 4 1 5other
  
 
 
解决方案
编辑：添加控制做角色向量。
 
 
 根据rle，我来看下面的解决方案：
  somefunction<  -  function（x）{
 
 if（！is.numeric（x））x<  -  as.numeric（x）
n< ;  -  length（x）
y<  - x [-1L]！= x [-n] + 1L 
i <-C（其中（y | is.na（y）），n）
 
列表（
 length = diff（c（0L，i）），
 values = x [head（c（0L，i）+ 1L，-1L）] 
）
 
} 
 
> s（c）（2,4,1：4，rep（5,4），6：9,4,4,4）
 
>一些功能
 $ length 
 [1] 1 1 5 1 1 5 1 1 1 
 
 $ values 
 [1] 2 4 1 5 5 5 4 4 4 
  
这一个适用于我尝试的每个测试用例，并使用没有ifelse子句的向量化值。应该跑得更快它将字符串转换为NA，以便保留数字输出。 
 > S  
>一些功能（S）
 $ length 
 [1] 1 1 5 1 1 1 3 1 
 
 $ values 
 [1] 4 2 1 5 NA NA 4 2 
 
警告信息：
某些功能（S）：强制引入的NAs 
  
 
Is there a way to encode increasing integer sequences in R, analogous to encoding run lengths using run length encoding (rle)?

I'll illustrate with an example:

Analogy: Run length encoding
r <- c(rep(1, 4), 2, 3, 4, rep(5, 5))
rle(r)
Run Length Encoding
  lengths: int [1:5] 4 1 1 1 5
  values : num [1:5] 1 2 3 4 5
Desired: sequence length encoding
s <- c(1:4, rep(5, 4), 6:9)
s
[1] 1 2 3 4 5 5 5 5 6 7 8 9

somefunction(s)
Sequence lengths
  lengths: int [1:4] 5 1 1 5
  value1 : num [1:4] 1 5 5 5
Edit 1

Thus, somefunction(1:10) will give the result:
Sequence lengths
  lengths: int [1:1] 10
  value1 : num [1:1] 1 
This results means that there is an integer sequence of length 10 with starting value of 1, i.e. seq(1, 10)

Note that there isn't a mistake in my example result.  The vector in fact ends in the sequence 5:9, not 6:9 which was used to construct it.

My use case is that I am working with survey data in an SPSS export file.  Each subquestion in a grid of questions will have a name of the pattern paste("q", 1:5), but sometimes there is an "other" category which will be marked q_99, q_other or something else.  I wish to find a way of identifying the sequences.

Edit 2

In a way, my desired function is the inverse of the base function sequence, with the start value, value1 in my example, added.
lengths <- c(5, 1, 1, 5)
value1 <- c(1, 5, 5, 5)

s
[1] 1 2 3 4 5 5 5 5 6 7 8 9
sequence(lengths) + rep(value1-1, lengths) 
[1] 1 2 3 4 5 5 5 5 6 7 8 9
Edit 3

I should have stated that for my purposes a sequence is defined as increasing integer sequences as opposed to monotonically increasing sequences, e.g. c(4,5,6,7) but not c(2,4,6,8) nor c(5,4,3,2,1).  However, any other integer can appear between sequences.

This means a solution should be able to cope with this test case:
somefunction(c(2, 4, 1:4, 5, 5))
    Sequence lengths
      lengths: int [1:4] 1 1 5 1
      value1 : num [1:4] 2 4 1 5 
In the ideal case, the solution can also cope with the use case suggested originally, which would include characters in the vector, e.g.
somefunction(c(2, 4, 1:4, 5, "other"))
    Sequence lengths
      lengths: int [1:5] 1 1 5 1 1
      value1 : num [1:5] 2 4 1 5 "other"

 解决方案 
EDIT : added control to do the character vectors as well.

Based on rle, I come to following solution :
somefunction <- function(x){

    if(!is.numeric(x)) x <- as.numeric(x)
    n <- length(x)
    y <- x[-1L] != x[-n] + 1L
    i <- c(which(y|is.na(y)),n)

    list(
      lengths = diff(c(0L,i)),
      values = x[head(c(0L,i)+1L,-1L)]
    )

}

> s <- c(2,4,1:4, rep(5, 4), 6:9,4,4,4)

    > somefunction(s)
    $lengths
    [1] 1 1 5 1 1 5 1 1 1

    $values
    [1] 2 4 1 5 5 5 4 4 4
This one works on every test case I tried and uses vectorized values without ifelse clauses. Should run faster. It converts strings to NA, so you keep a numeric output. 
> S <- c(4,2,1:5,5, "other" , "other",4:6,2)

> somefunction(S)
$lengths
[1] 1 1 5 1 1 1 3 1

$values
[1]  4  2  1  5 NA NA  4  2

Warning message:
In somefunction(S) : NAs introduced by coercion


                        
这篇关于使用R的序列长度编码的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用R的序列长度编码 [英] Sequence length encoding using R

问题描述

相关文章

开发方法最新文章

热门教程

热门工具

登录关闭

使用R的序列长度编码 [英] Sequence length encoding using R

问题描述

相关文章

开发方法最新文章

热门教程

热门工具

登录 关闭

登录关闭