在保持日期列完整的同时执行 pandas 聚合 [英] perform pandas aggregation whiles keeping the date column intact
本文介绍了在保持日期列完整的同时执行 pandas 聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
user = {'id':['abab23', 'abab21', 'abab22', 'abab25', 'abab24', 'abab30', 'abab252', 'abab15'],
'dob':['10-10-1990','1-12-1993', '12-12-2000', '2-10-1999', '2-10-1999', '2-10-1999', '2-10-1999', '2-10-1999']}
activities = {'sentconn':['abab35', 'abab15', 'abab25', 'abab23','abab22', 'abab15'],
'receiveconj': ['abab24', 'abab24', 'abab21', 'abab35', 'abab252', 'abab30'],
'sentdate':['2-10-2020', '2-10-2020','4-10-2020', '5-10-2020', '10-10-2020', '11-10-2020'],
'receivedDate':['2-10-2020', '2-10-2020','4-10-2020', '5-10-2020', '10-10-2020', '11-10-2020']}
user = pd.DataFrame(user)
activities = pd.DataFrame(activities)
sentconn receiverconn sentdate receivedDate
0 abab35 abab24 2-10-2020 2-10-2020
1 abab15 abab24 2-10-2020 2-10-2020
2 abab25 abab21 4-10-2020 4-10-2020
3 abab23 abab35 5-10-2020 5-10-2020
4 abab22 abab252 10-10-2020 10-10-2020
5 abab15 abab30 11-10-2020 11-10-2020
id dob
0 abab23 10-10-1990
1 abab21 1-12-1993
2 abab22 12-12-2000
3 abab25 2-10-1999
4 abab24 2-10-1999
5 abab30 2-10-1999
6 abab252 2-10-1999
7 abab15 2-10-1999
我想要的是检查用户数据帧中的id"是否按日期发送请求或接收连接请求的次数
what i want is to check if how many times did the 'id' in the user dataframe send request or receive connection request by date
date id sent_connection receivedconnection
2-10-2020 abab15 1 0
abab24 0 2
4-10-2020 abab25 1 0
abab21 0 1
abab23 1 0
abab23 0 1
这样的东西就是我想要的
something like this is what i want
推荐答案
使用:
#seelct only necessary columns
activities = activities[['sentconn','receiveconj','sentdate','receivedDate']]
#set new columns names
activities.columns = ['sent_id','receive_id','sent_date','receive_date']
#ssplit columns names by _ to MultiIndex
activities.columns = activities.columns.str.split('_', expand=True)
#reshape DataFrame and filter by is with id in inner merge
activities = (activities.stack(0)
.rename_axis([None, 'type'])
.reset_index(level=1)
.merge(user['id']))
print (activities)
type date id
0 receive 2-10-2020 abab24
1 receive 2-10-2020 abab24
2 sent 2-10-2020 abab15
3 sent 11-10-2020 abab15
4 receive 4-10-2020 abab21
5 sent 4-10-2020 abab25
6 sent 5-10-2020 abab23
7 receive 10-10-2020 abab252
8 sent 10-10-2020 abab22
9 receive 11-10-2020 abab30
#get counts by crosstab
df = pd.crosstab([activities['date'], activities['id']], activities['type'])
print (df)
type receive sent
date id
10-10-2020 abab22 0 1
abab252 1 0
11-10-2020 abab15 0 1
abab30 1 0
2-10-2020 abab15 0 1
abab24 2 0
4-10-2020 abab21 1 0
abab25 0 1
5-10-2020 abab23 0 1
这篇关于在保持日期列完整的同时执行 pandas 聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文