按列分组并将一列汇总为列表 [英] Group by columns and summarize a column into a list

查看:29
本文介绍了按列分组并将一列汇总为列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个这样的数据框:

I have a dataframe like this:

sample_df<-data.frame(
   client=c('John', 'John','Mary','Mary'),
   date=c('2016-07-13','2016-07-13','2016-07-13','2016-07-13'),
   cluster=c('A','B','A','A'))

#sample data frame
   client date         cluster
1  John   2016-07-13    A 
2  John   2016-07-13    B 
3  Mary   2016-07-13    A 
4  Mary   2016-07-13    A             

我想把它转换成不同的格式,就像:

I would like to transform it into different format, which will be like:

#ideal data frame
   client date         cluster
1  John   2016-07-13    c('A,'B') 
2  Mary   2016-07-13    A 

对于集群"列,如果某个客户端在同一日期属于不同的集群,它将是一个列表.

For the 'cluster' column, it will be a list if some client is belong to different cluster on the same date.

我想我可以用 dplyr 包来做,推荐如下

I thought I can do it with dplyr package with commend as below

library(dplyr)
ideal_df<-sample %>% 
    group_by(client, date) %>% 
    summarize( #some anonymous function)

但是,我不知道在这种情况下如何编写匿名函数.有没有办法将数据转换成理想的格式?

However, I don't know how to write the anonymous function in this situation. Is there a way to transform the data into the ideal format?

推荐答案

我们可以使用 toString 将 'cluster' 中的 unique 元素按 'client 分组后连接到一起'

We can use toString to concat the unique elements in 'cluster' together after grouping by 'client'

r1 <- sample_df %>% 
         group_by(client, date) %>%
         summarise(cluster = toString(unique(cluster)))

或者另一种选择是创建一个 list

Or another option would be to create a list column

r2 <- sample_df %>%
         group_by(client, date) %>% 
         summarise(cluster = list(unique(cluster)))

我们可以unnest

library(tidyr)
r2 %>%
    ungroup %>%
     unnest()

这篇关于按列分组并将一列汇总为列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆