按列分组并将一列汇总到列表中 [英] Group by columns and summarize a column into a list

查看：100 发布时间：2020/10/26 2:39:38 r group-by dplyr

本文介绍了按列分组并将一列汇总到列表中的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个这样的数据框：

  sample_df< -data.frame（
 client = c（' John'，'John'，'Mary'，'Mary'），
 date = c（'2016-07-13'，'2016-07-13'，'2016-07-13'，'2016 -07-13'），
 cluster = c（'A'，'B'，'A'，'A'））
 
＃示例数据框
客户日期群集
 1约翰2016-07-13 A 
 2约翰2016-07-13 B 
 3玛丽2016-07-13 A 
 4玛丽2016-07-13 A

我想将其转换为其他格式，例如：

 ＃理想数据框
客户日期群集
 1 John 2016-07-13 c（'A，'B'）
 2 Mary 2016-07-13 A

对于群集列，它将是列出是否有某个客户在同一日期属于不同的群集。

  library（dplyr）
 Ideal_df< -sample％> ％
 group_by（客户，日期）％&％;％
摘要（＃一些匿名函数）

但是，在这种情况下，我不知道如何编写匿名函数。有没有办法将数据转换为理想的格式？

解决方案

我们可以使用 toString 在按客户分组后，将群集中的独特元素合并在一起

  r1<-sample_df％>％
 group_by（客户，日期）％&％;％
 summarise（cluster = toString（unique（cluster）））

另一种选择是创建列表列

  r2<-sample_df％&％;％
 group_by（客户，日期）％&％;％
 summarise（cluster = list（unique（cluster）））

我们可以巢（nest）

 库（tidyr）
 r2％>％
取消分组％>％
 unnest（）

I have a dataframe like this:

sample_df<-data.frame(
   client=c('John', 'John','Mary','Mary'),
   date=c('2016-07-13','2016-07-13','2016-07-13','2016-07-13'),
   cluster=c('A','B','A','A'))

#sample data frame
   client date         cluster
1  John   2016-07-13    A 
2  John   2016-07-13    B 
3  Mary   2016-07-13    A 
4  Mary   2016-07-13    A

I would like to transform it into different format, which will be like:

#ideal data frame
   client date         cluster
1  John   2016-07-13    c('A,'B') 
2  Mary   2016-07-13    A

For the 'cluster' column, it will be a list if some client is belong to different cluster on the same date.

I thought I can do it with dplyr package with commend as below

library(dplyr)
ideal_df<-sample %>% 
    group_by(client, date) %>% 
    summarize( #some anonymous function)

However, I don't know how to write the anonymous function in this situation. Is there a way to transform the data into the ideal format?

解决方案

We can use toString to concat the unique elements in 'cluster' together after grouping by 'client'

r1 <- sample_df %>% 
         group_by(client, date) %>%
         summarise(cluster = toString(unique(cluster)))

Or another option would be to create a list column

r2 <- sample_df %>%
         group_by(client, date) %>% 
         summarise(cluster = list(unique(cluster)))

which we can unnest

library(tidyr)
r2 %>%
    ungroup %>%
     unnest()

这篇关于按列分组并将一列汇总到列表中的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

按列分组并将一列汇总到列表中 [英] Group by columns and summarize a column into a list

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

按列分组并将一列汇总到列表中 [英] Group by columns and summarize a column into a list

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭