使用data.table建立索引序列块 [英] Indexing sequence chunks using data.table

查看：197 发布时间：2017/3/12 12:53:35 r indexing data.table sequence chunks

本文介绍了使用data.table建立索引序列块的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我有一个数据集，其中长度1的序列是非法的，长度2是合法的，大于长度5是非法的，但允许将更长的序列断开成< = 5序列。

  set.seed（1）
 DT1  DT1 [，smp：= 1：length（smp）] 
 DT1 [，Seq：= seq 0，abs（diff（R））））]

最后一行直接来自：
在数据中创建序列。

  DT1 [，fix_min：= ifelse（（R == TRUE& Seq = = 1）|（R == FALSE），FALSE，TRUE）] 
 fixmin_idx2<  -  which（DT1 [，fix_min == TRUE]）
 DT1 [fixmin_idx2 -1，fix_min：= TRUE]

现在我的长度2 legals已正确标记。分隔> 5秒。

  DT1 [R == 1& Seq == 6，fix_min：= FALSE] 
 DT1 [，Seq2：= seq（.N），by = list（cumsum（c（0，abs（diff（fix_min））） b DT1 [R == 1& Seq2 == 6，fix_min：= FALSE] 
 fixSeq2_idx7<  - 它（DT1 [，fix_min == TRUE]& DT1 [，Seq2 == 7]）
 fixSeq2_idx7 
 [1 ] 10203 13228 
 DT1 [fixSeq2_idx7，] 
 smp R Seq fix_min Seq2 
 1：10203 1 13 TRUE 7 
 2：13228 1 13 TRUE 7 
 DT1 [fixSeq2_idx7 + 1，] 
 smp R Seq fix_min Seq2 
 1：10204 1 14 TRUE 8 
 2：13229 0 1 FALSE 1

现在要测试一个Seq2 == 7后面是一个Seq2 == 8，这将是一个合法的2长度。我一个7跟随一个8和一个没有跟随一个8.有我被卡住了。我尝试的一切都将所有fix_min设置为TRUE或交替TRUE和FALSE。

任何指导都非常感激。

解决方案

如果我正确理解你的问题，你想将 fix_min 设置为 FALSE 当 R == 0 或 R == 1& （1 =< Seq< 6 | Seq> 6）。然后下面应该给你你想要的：

 ＃从你的第一个代码块中重新创建数据
 set.seed （1）
 DT1 ] [，Seq：= seq N），by = rleid（R）
] [，Seq2：= Seq [.N]，by = rleid（R）] 
 
添加所需的'fix_min'列
 DT1 [，fix_min：=（R == 1& Seq [.N]> 1& Seq %% 6！= 0），by = rleid（R）
] [R == 1 & Seq %% 6 == 1& Seq2 %% 6 == 1&说明：：   p> 
 
  
   data.table（R = sample（0：1,20000，rep = TRUE））创建 data.table的基础 
 
   [，smp：=。I] 并将其添加到 data.table  
 
   by = rleid（R） ;看看它是什么尝试： data.table（R = sample（0：1，20000，rep = TRUE））[，seq.id:=rleid(R)] 
 
   [，Seq：= seq（.N），by = rleid（R）] 为每个序列创建一个索引，将其添加到 data.table ;序列由 rleid（R） 
 
   [，Seq2：= Seq [.N]  $  fix_min：=（R = a）
 = 1& Seq [.N]> 1& Seq %% 6！= 0）创建具有 TRUE   R == 1 &序列的长度大于一个（ Seq [.N]> 1 ），排除序列号是 6的倍数的值（ Seq %% 6！= 0 ） 
   R == 1 & Seq %% 6 == 1& Seq2 %% 6 == 1& Seq == Seq2 过滤 data.table 如下： R == 1 序列值为 7 ， 13 ， 19 等（ Seq %% 6 == 1 ）&序列的长度 7 ， 13 ， 19  ，等等，并且只从满足其他条件的序列中选择最后一行（ Seq == Seq2 ）。使用 fix_min：= FALSE 将它们设置为 FALSE 。
 
 
 
Say I have a data set where sequences of length 1 are illegal, length 2 are legal, greater than length 5 are illegal but it is allowed to break longer sequences up into <=5 sequences.
set.seed(1)
DT1 <- data.table(smp = 1, R=sample(0:1, 20000, rep=TRUE), Seq = 0L)
DT1[, smp:=1:length(smp)]
DT1[, Seq:=seq(.N), by=list(cumsum(c(0, abs(diff(R)))))]
This last line comes directly from:
Creating a sequence in a data.table depending on a column
DT1[, fix_min:=ifelse((R==TRUE & Seq==1) | (R==FALSE), FALSE, TRUE)]
fixmin_idx2 <- which(DT1[, fix_min==TRUE])
DT1[fixmin_idx2 -1, fix_min:=TRUE]
Now my length 2 legals are properly marked. Break up the >5s.
DT1[R==1 & Seq==6, fix_min:=FALSE]
DT1[,Seq2:=seq(.N), by=list(cumsum(c(0, abs(diff(fix_min)))))]
DT1[R==1 & Seq2==6, fix_min:=FALSE]
fixSeq2_idx7 <- which(DT1[,fix_min==TRUE] & DT1[,Seq2==7])
fixSeq2_idx7
[1] 10203 13228
DT1[fixSeq2_idx7,]
 smp R Seq fix_min Seq2
1: 10203 1  13    TRUE    7
2: 13228 1  13    TRUE    7
DT1[fixSeq2_idx7 + 1,]
 smp R Seq fix_min Seq2
1: 10204 1  14    TRUE    8
2: 13229 0   1   FALSE    1
And now to test if a Seq2==7 is followed by an Seq2==8, which would be a legal 2 length. I one 7 followed by an 8 and one not followed by an 8. And there I'm stuck. Everything I've tried either sets all fix_min to TRUE or alternation TRUE and FALSE.

Any guidance greatly appreciated.
 解决方案 
If I understand your question correctly, you want to set the fix_min to FALSE when R == 0 or when R == 1 & (1 =< Seq < 6 | Seq > 6). Then the following should give you what you want:
# recreating the data from your first code block
set.seed(1)
DT1 <- data.table(R=sample(0:1, 20000, rep=TRUE))[, smp:=.I
                                                  ][, Seq:=seq(.N), by=rleid(R)
                                                    ][, Seq2 := Seq[.N], by=rleid(R)]

# adding the needed 'fix_min' column
DT1[, fix_min := (R==1 & Seq[.N] > 1 & Seq%%6!=0), by=rleid(R)
    ][R==1 & Seq%%6==1 & Seq2%%6==1 & Seq==Seq2, fix_min := FALSE]
Explanation:


data.table(R=sample(0:1, 20000, rep=TRUE)) creates the base of the data.table
[, smp:=.I] creates an index and adds it to the data.table
by=rleid(R) identifies the sequences; to see what it does try: data.table(R=sample(0:1, 20000, rep=TRUE))[, seq.id:=rleid(R)]
[, Seq:=seq(.N), by=rleid(R)] creates an index for each sequence and adds it to the data.table; the sequences are identified by rleid(R)
[, Seq2 := Seq[.N], by=rleid(R)] creates a variable that contains a value indicating the length of the sequence
fix_min := (R==1 & Seq[.N] > 1 & Seq%%6!=0) creates a logical vector with TRUE values where R==1 & the length of the sequence is larger than one (Seq[.N] > 1) excluding the values where the sequence number is a multiple of 6 (Seq%%6!=0)
R==1 & Seq%%6==1 & Seq2%%6==1 & Seq==Seq2 filters the data.table as follows: only rows where R==1 & the sequence value is 7, 13, 19, etc (Seq%%6==1) & the length of the sequence is 7, 13, 19, etc and only selects the last row (Seq==Seq2) from the sequences that meet the other conditions. With fix_min := FALSE you set them to FALSE.


                        这篇关于使用data.table建立索引序列块的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

使用data.table建立索引序列块 [英] Indexing sequence chunks using data.table

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用data.table建立索引序列块 [英] Indexing sequence chunks using data.table

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭