计算记录数并在 data.table 中的每个组内生成行号 [英] Count number of records and generate row number within each group in a data.table
问题描述
我有以下data.table
I have the following data.table
set.seed(1)
DT <- data.table(VAL = sample(c(1, 2, 3), 10, replace = TRUE))
VAL
1: 1
2: 2
3: 2
4: 3
5: 1
6: 3
7: 3
8: 2
9: 2
10: 1
在VAL
中的每个数字我想:
- 计算记录/行数
- 创建第一次、第二次、第三次出现等的行索引(计数器).
最后我想要结果
VAL COUNT IDX
1: 1 3 1
2: 2 4 1
3: 2 4 2
4: 3 3 1
5: 1 3 2
6: 3 3 2
7: 3 3 3
8: 2 4 3
9: 2 4 4
10: 1 3 3
其中COUNT"是每个VAL"的记录/行数,IDX"是每个VAL"内的行索引.
where "COUNT" is the number of records/rows for each "VAL", and "IDX" is the row index within each "VAL".
我尝试使用 .I
处理 which
和 length
:
I tried to work with which
and length
using .I
:
dt[, list(COUNT = length(VAL == VAL[.I]),
IDX = which(which(VAL == VAL[.I]) == .I))]
但这不起作用,因为 .I
是指带有索引的向量,所以我想必须使用 .I[]
.虽然在 .I[]
我再次面临问题,我没有行索引并且我知道(通过阅读 data.table
常见问题解答并关注此处的帖子) 如果可能的话,应该避免循环遍历行.
but this does not work as .I
refers to a vector with the index, so I guess one must use .I[]
. Though inside .I[]
I again face the problem, that I do not have the row index and I do know (from reading data.table
FAQ and following the posts here) that looping through rows should be avoided if possible.
那么,data.table
方法是什么?
推荐答案
使用.N
...
DT[ , `:=`( COUNT = .N , IDX = 1:.N ) , by = VAL ]
# VAL COUNT IDX
# 1: 1 3 1
# 2: 2 4 1
# 3: 2 4 2
# 4: 3 3 1
# 5: 1 3 2
# 6: 3 3 2
# 7: 3 3 3
# 8: 2 4 3
# 9: 2 4 4
#10: 1 3 3
.N
是每组的记录数,组由"VAL"
定义.
.N
is the number of records in each group, with groups defined by "VAL"
.
这篇关于计算记录数并在 data.table 中的每个组内生成行号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!