Tidyverse:过滤分组数据框中的n个最大组 [英] Tidyverse: filtering n largest groups in grouped dataframe

查看:222
本文介绍了Tidyverse:过滤分组数据框中的n个最大组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想根据计数过滤n个最大的组,然后对过滤后的数据帧进行一些计算

I want to filter the n largest groups based on count, and then do some calculations on the filtered dataframe

这是一些数据

Brand <- c("A","B","C","A","A","B","A","A","B","C")
Category <- c(1,2,1,1,2,1,2,1,2,1)
Clicks <- c(10,11,12,13,14,15,14,13,12,11)
df <- data.frame(Brand,Category,Clicks)

|Brand | Category| Clicks|
|:-----|--------:|------:|
|A     |        1|     10|
|B     |        2|     11|
|C     |        1|     12|
|A     |        1|     13|
|A     |        2|     14|
|B     |        1|     15|
|A     |        2|     14|
|A     |        1|     13|
|B     |        2|     12|
|C     |        1|     11|

这是我的预期输出.我想按计数过滤出两个最大的品牌,然后找到每个品牌/类别组合中的平均点击次数

This is my expected output. I want to filter out the two largest brands by count and then find the mean clicks in each brand / category combination

|Brand | Category| mean_clicks|
|:-----|--------:|-----------:|
|A     |        1|        12.0|
|A     |        2|        14.0|
|B     |        1|        15.0|
|B     |        2|        11.5|

我以为这样的代码可以实现(但不能)

Which I thought could be achieved with code like this (but can't)

df %>%
  group_by(Brand, Category) %>%
  top_n(2, Brand) %>% # Largest 2 brands by count
  summarise(mean_clicks = mean(Clicks))

理想的答案应该能够在数据库表和本地表上使用

the ideal answer should be able to be used on database tables as well as local tables

推荐答案

另一个使用join过滤数据帧的dplyr解决方案:

Another dplyr solution using a join to filter the data frame:

library(dplyr)

df %>%
  group_by(Brand) %>%
  summarise(n = n()) %>%
  top_n(2) %>% # select top 2
  left_join(df, by = "Brand") %>% # filters out top 2 Brands
  group_by(Brand, Category) %>%
  summarise(mean_clicks = mean(Clicks))

# # A tibble: 4 x 3
# # Groups:   Brand [?]
#   Brand Category mean_clicks
#   <fct>    <dbl>       <dbl>
# 1 A            1        12  
# 2 A            2        14  
# 3 B            1        15  
# 4 B            2        11.5

这篇关于Tidyverse:过滤分组数据框中的n个最大组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆