如何按多列分组 [英] how to group by multiple columns

查看:70
本文介绍了如何按多列分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想根据UserId,Date,category(每天使用频率),每个类别的最大持续时间以及一天中使用最多的那部分按不同的列对数据框进行分组,最后将结果存储在.csv文件.

I want to group by my dataframe by different columns based on UserId,Date,category (frequency of use per day ) ,max duration per category ,and the part of the day when it is most used and finally store the result in a .csv file.

 name     duration  UserId  category    part_of_day    Date 
 Settings   3.436    1    System tool      evening   2020-09-10 
 Calendar   2.167    1    Calendar         night     2020-09-11 
 Calendar   5.705    1    Calendar         night     2020-09-11 
 Messages   7.907    1   Phone_and_SMS     night     2020-09-11 
Instagram   50.285   9   Social            night    2020-09-28  
Drive       30.260   9  Productivity       night    2020-09-28   

df.groupby(["UserId", "Date","category"])["category"].count()  

我的代码结果是:

  UserId      Date        category               
    1       2020-09-10    System tool                  1
           2020-09-11     Calendar                     8
                         Clock                         2
                        Communication                  86
                      Health & Fitness                 5     
 

但是我想要这个结果

  UserId      Date        category             count(category)  max-duration 
    1       2020-09-10  System tool                  1            3

            2020-09-11  Calendar                     2            5

    2       2020-09-28    Social                     1            50
                          Productivity               1            30

我该怎么做?我找不到任何解决方案的想要的结果

How can I do that? I can not find the wanted result for any solution

推荐答案

您似乎想要计算每个组的统计信息.

It looks like you might be wanting to calculate statistics for each group.

grouped = df.groupby(["UserId", "Date","category"])

result = grouped.agg({'category': 'count', 'duration': 'max'})
result.columns = ['group_count','duration_max']

result = result.reset_index()

result
   UserId        Date       category  group_count  duration_max
0       1  2020-09-10    System tool            1         3.436
1       1  2020-09-11       Calendar            2         5.705
2       1  2020-09-11  Phone_and_SMS            1         7.907
3       9  2020-09-28   Productivity            1        30.260
4       9  2020-09-28         Social            1        50.285

这篇关于如何按多列分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆