计算记录数并在 data.table 中的每个组内生成行号 [英] Count number of records and generate row number within each group in a data.table

查看:18
本文介绍了计算记录数并在 data.table 中的每个组内生成行号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下data.table

I have the following data.table

set.seed(1)
DT <- data.table(VAL = sample(c(1, 2, 3), 10, replace = TRUE))
    VAL
 1:   1
 2:   2
 3:   2
 4:   3
 5:   1
 6:   3
 7:   3
 8:   2
 9:   2
10:   1

VAL中的每个数字我想:

  1. 计算记录/行数
  2. 创建第一次、第二次、第三次出现等的行索引(计数器).

最后我想要结果

    VAL COUNT IDX
 1:   1     3   1
 2:   2     4   1
 3:   2     4   2
 4:   3     3   1
 5:   1     3   2
 6:   3     3   2
 7:   3     3   3
 8:   2     4   3
 9:   2     4   4
10:   1     3   3

其中COUNT"是每个VAL"的记录/行数,IDX"是每个VAL"内的行索引.

where "COUNT" is the number of records/rows for each "VAL", and "IDX" is the row index within each "VAL".

我尝试使用 .I 处理 whichlength:

I tried to work with which and length using .I:

 dt[, list(COUNT = length(VAL == VAL[.I]), 
             IDX = which(which(VAL == VAL[.I]) == .I))]

但这不起作用,因为 .I 是指带有索引的向量,所以我想必须使用 .I[].虽然在 .I[] 我再次面临问题,我没有行索引并且我知道(通过阅读 data.table 常见问题解答并关注此处的帖子) 如果可能的话,应该避免循环遍历行.

but this does not work as .I refers to a vector with the index, so I guess one must use .I[]. Though inside .I[] I again face the problem, that I do not have the row index and I do know (from reading data.table FAQ and following the posts here) that looping through rows should be avoided if possible.

那么,data.table 方法是什么?

推荐答案

使用.N...

DT[ , `:=`( COUNT = .N , IDX = 1:.N ) , by = VAL ]
#    VAL COUNT IDX
# 1:   1     3   1
# 2:   2     4   1
# 3:   2     4   2
# 4:   3     3   1
# 5:   1     3   2
# 6:   3     3   2
# 7:   3     3   3
# 8:   2     4   3
# 9:   2     4   4
#10:   1     3   3

.N是每组的记录数,组由"VAL"定义.

.N is the number of records in each group, with groups defined by "VAL".

这篇关于计算记录数并在 data.table 中的每个组内生成行号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆