从数据框中选择子序列 [英] Select subsequences from a dataframe

查看：74 发布时间：2020/10/17 0:49:46 r dataframe sequence

本文介绍了从数据框中选择子序列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下数据框：

  df<-structure（list（a = c（1，43，22 ，12，35，113，54，94），b = c（ a，
 b， c， d， e， f， g， h ））。.names = c（ a， b），row.names = c（NA，
 -8L），class = c（ tbl_df， tbl， data.frame ））

我想从此数据中选择一定长度的连续子序列。例如，对于两个长度的序列，我想选择1-2、2-3、3-4等行，直到数据帧的最后一行。然后应标记每个子序列。

子序列长度为2，新的 df 及其序列标签如下所示：

  ab seq_label 
 1 a 1＃第一个子序列，第1-2行
 43 b 1＃
 43 b 2＃第二个子序列，第2-3行
 22 c 2＃
 22 c 3＃第三个子序列，第3-4行
 12 d 3＃
 12 d 4 
 35 e 4 
 35 e 5 
 113 f 5 
 113 f 6 
 54 g 6 
 54 g 7 
 94 h 7 
  b的子序列长度类似：3 
  ab seq_label 
 1 a 1＃第一个子序列，第1-3行
 43 b 1＃
 22 c 1＃
 43 b 2＃第二个子列，第2-4行
 22 c 2＃
 12 d 2＃
 22 c 3＃第三子序列，第3-5行
 12 d 3＃
 35 e 3＃
 12 d 4 
 35 e 4 
 113 f 4 
 35 e 5 
 113 f 5 
 54 g 5 
 113 f 6 
 54 g 6 
 94 h 6 
  
 .... 
 
 
 感谢@drjones的建议答案I已提出解决方案：
  map_dfr（1：（nrow（df）-n + 1），函数（i）{cbind （df [i：（i + n-1），]， seq_label = i）}）
  
 
 
解决方案
我们可以使用外部创建索引：
  n<-2 
i<-1：1：（nrow（df）-（n-1））
 
 cbind（df [t（outer（i ，1：n-1，`+`）），]，
 seq_label = rep（i，每个= n））
＃ab seq_label 
＃1 1 a 1 
 ＃2 43 b 1 
＃3 43 b 2 
＃4 22 c 2 
＃5 22 c 3 
＃6 12 d 3 
＃7 12 d 4 
＃8 35 e 4 
＃9 35 e 5 
＃10113 f 5 
＃11113 f 6 
＃12 54 g 6 
＃ 13 54 g 7 
＃1494 h 7 
  
 
 
 
 
 
  ...或 kronecker ：
  cbind（df [kronecker（X = i，Y = 1：n-1，FUN =`+`），]，
 seq_label = rep（i，每个= n ））
  
 
 
 
 
 
  ...或嵌入：
  i <-：1：nrow（df）
 cbind （df [as.vector（t（embed（i，n）[，n：1]）），]，
 seq_label = rep（head（i，-（n-1）），每个= n） ）
  
 
I have the following dataframe:
df <- structure(list(a = c(1, 43, 22, 12, 35, 113, 54, 94), b = c("a", 
"b", "c", "d", "e", "f", "g", "h")), .Names = c("a", "b"), row.names = c(NA, 
-8L), class = c("tbl_df", "tbl", "data.frame"))
From this data I want to select consecutive subsequences of a certain length. For example, for a sequence length of two, I want to select rows 1-2, 2-3, 3-4, and so on until the last row of the data frame. Each subsequence should then be labelled. 

With a subsequence length of 2, new df with its sequence labels would look like this:
a   b   seq_label
1   a   1 # First subsequence, row 1-2      
43  b   1 # 
43  b   2 # Second subsequence, row 2-3     
22  c   2 #         
22  c   3 # Third subsequence, row 3-4
12  d   3 #     
12  d   4
35  e   4       
35  e   5
113 f   5       
113 f   6
54  g   6       
54  g   7
94  h   7
Similar with a subsequence length of 3:
a   b  seq_label
1   a  1 # First subsequence, row 1-3
43  b  1 #          
22  c  1 #
43  b  2 # Second subsequence, row 2-4
22  c  2 #
12  d  2 #
22  c  3 # Third subsequence, row 3-5
12  d  3 #
35  e  3 #
12  d  4
35  e  4
113 f  4
35  e  5
113 f  5
54  g  5
113 f  6
54  g  6
94  h  6
....

Thanks for @drjones's suggested answer I have advanced the solution:
map_dfr(1:(nrow(df) - n + 1), function (i) {cbind(df[i:(i + n - 1), ], "seq_label" = i)})

 解决方案 
We may create the indices using outer:
n <- 2
i <- 1:(nrow(df) - (n - 1))

cbind(df[t(outer(i, 1:n - 1, `+`)), ],
      seq_label = rep(i, each = n))
#      a b seq_label
# 1    1 a         1
# 2   43 b         1
# 3   43 b         2
# 4   22 c         2
# 5   22 c         3
# 6   12 d         3
# 7   12 d         4
# 8   35 e         4
# 9   35 e         5
# 10 113 f         5
# 11 113 f         6
# 12  54 g         6
# 13  54 g         7
# 14  94 h         7




...or kronecker:
cbind(df[kronecker(X = i, Y = 1:n - 1, FUN = `+`), ],
      seq_label = rep(i, each = n))




...or embed:
i <- 1:nrow(df)
cbind(df[as.vector(t(embed(i, n)[ , n:1])), ],
      seq_label = rep(head(i, -(n - 1)), each = n))


                        
这篇关于从数据框中选择子序列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

从数据框中选择子序列 [英] Select subsequences from a dataframe

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

从数据框中选择子序列 [英] Select subsequences from a dataframe

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭