如何按多列分组 [英] how to group by multiple columns
本文介绍了如何按多列分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想根据UserId,Date,category(每天使用频率),每个类别的最大持续时间以及一天中使用最多的那部分按不同的列对数据框进行分组,最后将结果存储在.csv文件.
I want to group by my dataframe by different columns based on UserId,Date,category (frequency of use per day ) ,max duration per category ,and the part of the day when it is most used and finally store the result in a .csv file.
name duration UserId category part_of_day Date
Settings 3.436 1 System tool evening 2020-09-10
Calendar 2.167 1 Calendar night 2020-09-11
Calendar 5.705 1 Calendar night 2020-09-11
Messages 7.907 1 Phone_and_SMS night 2020-09-11
Instagram 50.285 9 Social night 2020-09-28
Drive 30.260 9 Productivity night 2020-09-28
df.groupby(["UserId", "Date","category"])["category"].count()
我的代码结果是:
UserId Date category
1 2020-09-10 System tool 1
2020-09-11 Calendar 8
Clock 2
Communication 86
Health & Fitness 5
但是我想要这个结果
UserId Date category count(category) max-duration
1 2020-09-10 System tool 1 3
2020-09-11 Calendar 2 5
2 2020-09-28 Social 1 50
Productivity 1 30
我该怎么做?我找不到任何解决方案的想要的结果
How can I do that? I can not find the wanted result for any solution
推荐答案
您似乎想要计算每个组的统计信息.
It looks like you might be wanting to calculate statistics for each group.
grouped = df.groupby(["UserId", "Date","category"])
result = grouped.agg({'category': 'count', 'duration': 'max'})
result.columns = ['group_count','duration_max']
result = result.reset_index()
result
UserId Date category group_count duration_max
0 1 2020-09-10 System tool 1 3.436
1 1 2020-09-11 Calendar 2 5.705
2 1 2020-09-11 Phone_and_SMS 1 7.907
3 9 2020-09-28 Productivity 1 30.260
4 9 2020-09-28 Social 1 50.285
这篇关于如何按多列分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文