根据列在数据表中创建序列 [英] Creating a sequence in a data.table depending on a column

查看：105 发布时间：2017/3/12 12:30:51 r data.table

本文介绍了根据列在数据表中创建序列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

说我有以下data.table：

  library（data.table）
 
 DT < -  data.table（R = sample（0：1，10000，rep = TRUE），Seq = 0）

b $ b

其中返回如下：

我想生成一个序列（1，2，3，...，n），当R从上一行改变时，它将重置。

因此，上面的代码如下：

想法？

解决方案

这是一个选项：

  set.seed（1）
 DT< .table（R = sample（0：1，10000，rep = TRUE），Seq = 0L）
 DT [，Seq：= seq diff（R））））] 
 DT

我们创建一个计数器，时间你的0-1变量使用 cumsum（abs（diff（R）））更改。 c（0，部分是为了确保我们得到正确的长度向量，然后用和这会产生：

b $ b

EDIT ：请求澄清请求：

可以查看我在 by ，细分为两个新列：

DT [，diff：= c ，diff（R））] DT [，cumsum：= cumsum（abs（diff））] print（DT，topn = 10） pre>

产生：

  R Seq diff cumsum 
 1：0 1 0 0 
 2：0 2 0 0 
 3：1 1 1 1 
 4：1 2 0 1 
 5：0 1 -1 2 
 6：1 1 1 3 
 7：1 2 0 3 
 8：1 3 0 3 
 9：1 4 0 3 
 10：0 1 -1 4 
 --- 
 9991：1 2 0 5021 
 9992：1 3 0 5021 
 9993：1 4 0 5021 
 9994：1 5 0 5021 
 9995 ：0 1 -1 5022 
 9996：1 1 1 5023 
 9997：0 1 -1 5024 
 9998：1 1 1 5025 
 9999：1 2 0 5025 
 10000：1 3 0 5025

您可以看到diff增量的绝对值的累积和每次R改变一次。然后，我们可以使用 cumsum 列将 data.table 分成块，并为每个块生成序列使用 seq（.N）计数到块中的项目数（ .N 每个由组中有多少项）。

Say I have the following data.table:

library(data.table)

DT <- data.table(R=sample(0:1, 10000, rep=TRUE), Seq=0)

Which returns something like:

       R Seq
    1: 1   0
    2: 1   0
    3: 0   0
    4: 0   0
    5: 1   0
   ---      
 9996: 1   0
 9997: 0   0
 9998: 0   0
 9999: 0   0
10000: 1   0

I want to generate a sequence (1, 2, 3,..., n) that resets whenever R changes from the previous row. Think of it like I'm counting a streak of random numbers.

So the above would then look like:

       R Seq
    1: 1   1
    2: 1   2
    3: 0   1
    4: 0   2
    5: 1   1
   ---      
 9996: 1   5
 9997: 0   1
 9998: 0   2
 9999: 0   3
10000: 1   2

Thoughts?

解决方案

Here is an option:

set.seed(1)
DT <- data.table(R=sample(0:1, 10000, rep=TRUE), Seq=0L)
DT[, Seq:=seq(.N), by=list(cumsum(c(0, abs(diff(R)))))]
DT

We create a counter that increments every time your 0-1 variable changes using cumsum(abs(diff(R))). The c(0, part is to ensure we get the correct length vector. Then we split by it with by. This produces:

       R Seq
    1: 0   1
    2: 0   2
    3: 1   1
    4: 1   2
    5: 0   1
   ---      
 9996: 1   1
 9997: 0   1
 9998: 1   1
 9999: 1   2
10000: 1   3

EDIT: Addressing request for clarification:

lets look at the computation I'm using in by, broken down into two new columns:

DT[, diff:=c(0, diff(R))]
DT[, cumsum:=cumsum(abs(diff))]
print(DT, topn=10)

Produces:

       R Seq diff cumsum
    1: 0   1    0      0
    2: 0   2    0      0
    3: 1   1    1      1
    4: 1   2    0      1
    5: 0   1   -1      2
    6: 1   1    1      3
    7: 1   2    0      3
    8: 1   3    0      3
    9: 1   4    0      3
   10: 0   1   -1      4
   ---                  
 9991: 1   2    0   5021
 9992: 1   3    0   5021
 9993: 1   4    0   5021
 9994: 1   5    0   5021
 9995: 0   1   -1   5022
 9996: 1   1    1   5023
 9997: 0   1   -1   5024
 9998: 1   1    1   5025
 9999: 1   2    0   5025
10000: 1   3    0   5025

You can see how the cumulative sum of the absolute of the diff increments by one each time R changes. We can then use that cumsum column to break up the data.table into chunks, and for each chunk, generate a sequence using seq(.N) that counts to the number of items in the chunk (.N represents exactly that, how many items in each by group).

这篇关于根据列在数据表中创建序列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

根据列在数据表中创建序列 [英] Creating a sequence in a data.table depending on a column

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

根据列在数据表中创建序列 [英] Creating a sequence in a data.table depending on a column

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭