计算记录数并在data.table中的每个组内生成行号 [英] Count number of records and generate row number within each group in a data.table
问题描述
我有以下data.table
I have the following data.table
set.seed(1)
DT <- data.table(VAL = sample(c(1, 2, 3), 10, replace = TRUE))
VAL
1: 1
2: 2
3: 2
4: 3
5: 1
6: 3
7: 3
8: 2
9: 2
10: 1
在之内 VAL
我要:
- 计算记录/行数
- 创建行索引(第一次),第二次,第三次出现等)。
最后我想要结果
VAL COUNT IDX
1: 1 3 1
2: 2 4 1
3: 2 4 2
4: 3 3 1
5: 1 3 2
6: 3 3 2
7: 3 3 3
8: 2 4 3
9: 2 4 4
10: 1 3 3
其中 COUNT是每个 VAL的记录/行数, IDX是行每个 VAL中的索引。
where "COUNT" is the number of records/rows for each "VAL", and "IDX" is the row index within each "VAL".
我尝试使用其中
和长度
.I
:
dt[, list(COUNT = length(VAL == VAL[.I]),
IDX = which(which(VAL == VAL[.I]) == .I))]
,但这不能作为起作用。我
指的是带有索引的向量,所以我猜一个人必须使用 .I []
。尽管在 .I []
内,我再次遇到问题,我没有行索引,而且我确实知道(通过阅读 data.table
常见问题解答和此处的后续文章),如果可能,应避免循环遍历行。
but this does not work as .I
refers to a vector with the index, so I guess one must use .I[]
. Though inside .I[]
I again face the problem, that I do not have the row index and I do know (from reading data.table
FAQ and following the posts here) that looping through rows should be avoided if possible.
那么, data.table
的方式是什么?
So, what's the data.table
way?
推荐答案
使用 .N
...
DT[ , `:=`( COUNT = .N , IDX = 1:.N ) , by = VAL ]
# VAL COUNT IDX
# 1: 1 3 1
# 2: 2 4 1
# 3: 2 4 2
# 4: 3 3 1
# 5: 1 3 2
# 6: 3 3 2
# 7: 3 3 3
# 8: 2 4 3
# 9: 2 4 4
#10: 1 3 3
.N
是记录数在每个组中,组由 VAL
定义。
.N
is the number of records in each group, with groups defined by "VAL"
.
这篇关于计算记录数并在data.table中的每个组内生成行号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!