如何将组的唯一值的计数添加到R data.frame [英] How to add count of unique values by group to R data.frame

查看:205
本文介绍了如何将组的唯一值的计数添加到R data.frame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望通过对第二个变量进行分组来计算唯一值的数量,然后将计数添加到现有的data.frame作为新列。例如,如果现有数据框如下所示:

I wish to count the number of unique values by grouping of a second variable, and then add the count to the existing data.frame as a new column. For example, if the existing data frame looks like this:

  color  type
1 black chair
2 black chair
3 black  sofa
4 green  sofa
5 green  sofa
6   red  sofa
7   red plate
8  blue  sofa
9  blue plate
10 blue chair

我要为每个颜色添加是数据中存在的唯一类型的数量:

I want to add for each color, the count of unique types that are present in the data:

  color  type unique_types
1 black chair            2
2 black chair            2
3 black  sofa            2
4 green  sofa            1
5 green  sofa            1
6   red  sofa            2
7   red plate            2
8  blue  sofa            3
9  blue plate            3
10 blue chair            3

我希望使用 ave ,但似乎找不到一个简单的方法不需要很多行。我有> 100,000行,所以我也不知道效率是多么重要。

I was hoping to use ave, but can't seem to find a straightforward method that doesn't require many lines. I have >100,000 rows, so am also not sure how important efficiency is.

这有点类似于这个问题:计算每个组的观察/行数,

It's somewhat similar to this issue: Count number of observations/rows per group and add result to data frame

推荐答案

使用 ave 具体地):

within(df, { count <- ave(type, color, FUN=function(x) length(unique(x)))})

确保 c $ c>是字符向量而不是因子。

Make sure that type is character vector and not factor.

因为你也说你的数据是巨大的,速度/性能因此可能是一个因素,我建议一个 data.table 解决方案。

Since you also say your data is huge and that speed/performance may therefore be a factor, I'd suggest a data.table solution as well.

require(data.table)
setDT(df)[, count := uniqueN(type), by = color] # v1.9.6+
# if you don't want df to be modified by reference
ans = as.data.table(df)[, count := uniqueN(type), by = color]

uniqueN v1.9.6 中实现, length(unique(。))。此外,它还与data.frames / data.tables一起使用。

uniqueN was implemented in v1.9.6 and is a faster equivalent of length(unique(.)). In addition it also works with data.frames/data.tables.

其他解决方案:

使用plyr:

require(plyr)
ddply(df, .(color), mutate, count = length(unique(type)))

aggregate

Using aggregate:

agg <- aggregate(data=df, type ~ color, function(x) length(unique(x)))
merge(df, agg, by="color", all=TRUE)

这篇关于如何将组的唯一值的计数添加到R data.frame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆