按组计算基于两列的唯一行数 [英] Count number of unique rows based on two columns, by group

查看:72
本文介绍了按组计算基于两列的唯一行数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在r中有一个数据表

    col1 col2 col3   col4
 1:  5.1  3.5  1.4 setosa
 2:  5.1  3.5  1.4 setosa
 3:  4.7  3.2  1.3 setosa
 4:  4.6  3.1  1.5 setosa
 5:  5.0  3.6  1.4 setosa
 6:  5.1  3.5  3.4    eer
 7:  5.1  3.5  3.4    eer
 8:  5.1  3.2  1.3    eer
 9:  5.1  3.5  1.5    eer
10:  5.1  3.5  1.4    eer


DT <- structure(list(col1 = c(5.1, 5.1, 4.7, 4.6, 5, 5.1, 5.1, 5.1, 
5.1, 5.1), col2 = c(3.5, 3.5, 3.2, 3.1, 3.6, 3.5, 3.5, 3.2, 3.5, 
3.5), col3 = c(1.4, 1.4, 1.3, 1.5, 1.4, 3.4, 3.4, 1.3, 1.5, 1.4
), col4 = structure(c(1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L), .Label = c("setosa", 
"versicolor", "virginica", "eer"), class = "factor")), .Names = c("col1", 
"col2", "col3", "col4"), row.names = c(NA, -10L), class = c("data.table", 
"data.frame"))

我要计算唯一(不同) col1 和<$ c $的组合每个值 col4 的c> col2 。

I want to count unique (distinct) combinations of col1 and col2 for each value of col4.

预期输出为

   col1 col2 col3   col4 count
 1:  5.1  3.5  1.4 setosa     4
 2:  5.1  3.5  1.4 setosa     4
 3:  4.7  3.2  1.3 setosa     4
 4:  4.6  3.1  1.5 setosa     4
 5:  5.0  3.6  1.4 setosa     4
 6:  5.1  3.5  3.4    eer     2
 7:  5.1  3.5  3.4    eer     2
 8:  5.1  3.2  1.3    eer     2
 9:  5.1  3.5  1.5    eer     2
10:  5.1  3.5  1.4    eer     2

如何仅在1个data.table语法中执行此操作?

How can I do this in 1 data.table syntax only?

推荐答案

我必须先进行几次尝试,然后最终完成。

I had to go through a few attempts first, and ended up with this. Any good?

DT[, count:=nrow(unique(.SD)), by=col4, .SDcols=c("col1","col2")]
DT
    col1 col2 col3   col4 count
 1:  5.1  3.5  1.4 setosa     4
 2:  5.1  3.5  1.4 setosa     4
 3:  4.7  3.2  1.3 setosa     4
 4:  4.6  3.1  1.5 setosa     4
 5:  5.0  3.6  1.4 setosa     4
 6:  5.1  3.5  3.4    eer     2
 7:  5.1  3.5  3.4    eer     2
 8:  5.1  3.2  1.3    eer     2
 9:  5.1  3.5  1.5    eer     2
10:  5.1  3.5  1.4    eer     2
> 

同样,但要感谢Procrastinatus的以下评论:

and the same but faster thanks to Procrastinatus comment below :

DT[, count:=uniqueN(.SD), by=col4, .SDcols=c("col1","col2")]

这篇关于按组计算基于两列的唯一行数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆