计算记录数,并在每个组中生成行号 [英] Count number of records and generate row number within each group

查看:105
本文介绍了计算记录数,并在每个组中生成行号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下data.table

  set.seed(1)
DT < - data。表(VAL = sample(c(1,2,3),10,replace = TRUE))
VAL
1:1
2:2
3:2
4:3
5:1
6:3
7:3
8:2
9:2
10:1 $ b $ VAL 中的 >我想:


  1. 计算记录/行数

  2. 创建行索引计数器)的第一,第二,第三发生等。

结束时我想要结果

  VAL COUNT IDX 
1:1 3 1
2:2 4 1
3:2 4 2
4:3 3 1
5:1 3 2
6:3 3 2
7:3 3 3
8:2 4 3
9:2 4 4
10:1 3 3

其中COUNT是每个VAL的记录/行数,IDX索引在每个VAL内。



我尝试使用其中 length .I

  dt [,list(COUNT = length == VAL [.I]),
IDX = which(which(VAL == VAL [.I])== .I))]

但这不工作,因为 .I 指的是带索引的向量,所以我想必须使用 .I [] 。虽然在 .I [] 我再次面临的问题,我没有行索引,我知道(从阅读 data.table 常见问题和这里的帖子),尽可能避免循环遍历行。



那么, data.table 方法是什么?

解决方案

使用 .N ...

  DT [,`:=`(COUNT = .N,IDX = 1:.N),by = VAL] 
#VAL COUNT IDX
#1:1 3 1
#2:2 4 1
#3:2 4 2
#4:3 3 1
#5:1 3 2
#6:3 3 2
#7:3 3 3
#8:2 4 3
#9:2 4 4
#10:1 3 3

.N 在每个组中,由VAL定义的组。


I have the following data.table

set.seed(1)
DT <- data.table(VAL = sample(c(1, 2, 3), 10, replace = TRUE))
    VAL
 1:   1
 2:   2
 3:   2
 4:   3
 5:   1
 6:   3
 7:   3
 8:   2
 9:   2
10:   1

Within each number in VAL I want to:

  1. Count the number of records/rows
  2. Create an row index (counter) of first, second, third occurrence et c.

At the end I want the result

    VAL COUNT IDX
 1:   1     3   1
 2:   2     4   1
 3:   2     4   2
 4:   3     3   1
 5:   1     3   2
 6:   3     3   2
 7:   3     3   3
 8:   2     4   3
 9:   2     4   4
10:   1     3   3

where "COUNT" is the number of records/rows for each "VAL", and "IDX" is the row index within each "VAL".

I tried to work with which and length using .I:

 dt[, list(COUNT = length(VAL == VAL[.I]), 
             IDX = which(which(VAL == VAL[.I]) == .I))]

but this does not work as .I refers to a vector with the index, so I guess one must use .I[]. Though inside .I[] I again face the problem, that I do not have the row index and I do know (from reading data.table FAQ and following the posts here) that looping through rows should be avoided if possible.

So, what's the data.table way?

解决方案

Using .N...

DT[ , `:=`( COUNT = .N , IDX = 1:.N ) , by = VAL ]
#    VAL COUNT IDX
# 1:   1     3   1
# 2:   2     4   1
# 3:   2     4   2
# 4:   3     3   1
# 5:   1     3   2
# 6:   3     3   2
# 7:   3     3   3
# 8:   2     4   3
# 9:   2     4   4
#10:   1     3   3

.N is the number of records in each group, with groups defined by "VAL".

这篇关于计算记录数,并在每个组中生成行号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆