在保持日期列完整的同时执行 pandas 聚合 [英] perform pandas aggregation whiles keeping the date column intact

查看:54
本文介绍了在保持日期列完整的同时执行 pandas 聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

user = {'id':['abab23', 'abab21', 'abab22', 'abab25', 'abab24', 'abab30', 'abab252', 'abab15'],
        'dob':['10-10-1990','1-12-1993', '12-12-2000', '2-10-1999', '2-10-1999', '2-10-1999', '2-10-1999', '2-10-1999']}
                
                
activities = {'sentconn':['abab35', 'abab15', 'abab25', 'abab23','abab22', 'abab15'],
             'receiveconj': ['abab24', 'abab24', 'abab21', 'abab35', 'abab252', 'abab30'],
              'sentdate':['2-10-2020', '2-10-2020','4-10-2020', '5-10-2020', '10-10-2020', '11-10-2020'],
               'receivedDate':['2-10-2020', '2-10-2020','4-10-2020', '5-10-2020', '10-10-2020', '11-10-2020']}
                             
        
    user = pd.DataFrame(user)
    activities = pd.DataFrame(activities)
    
    
    sentconn    receiverconn    sentdate    receivedDate
    0   abab35  abab24         2-10-2020    2-10-2020
    1   abab15  abab24         2-10-2020    2-10-2020
    2   abab25  abab21         4-10-2020    4-10-2020
    3   abab23  abab35         5-10-2020    5-10-2020
    4   abab22  abab252       10-10-2020    10-10-2020
    5   abab15  abab30        11-10-2020    11-10-2020

    id       dob
0   abab23  10-10-1990
1   abab21  1-12-1993
2   abab22  12-12-2000
3   abab25  2-10-1999
4   abab24  2-10-1999
5   abab30  2-10-1999
6   abab252 2-10-1999
7   abab15  2-10-1999

我想要的是检查用户数据帧中的id"是否按日期发送请求或接收连接请求的次数

what i want is to check if how many times did the 'id' in the user dataframe send request or receive connection request by date

date         id      sent_connection   receivedconnection
2-10-2020   abab15    1                     0
            abab24    0                     2

4-10-2020   abab25    1                     0
            abab21    0                     1
            abab23    1                     0
            abab23    0                     1

这样的东西就是我想要的

something like this is what i want

推荐答案

使用:

#seelct only necessary columns
activities = activities[['sentconn','receiveconj','sentdate','receivedDate']]

#set new columns names
activities.columns = ['sent_id','receive_id','sent_date','receive_date']

#ssplit columns names by _ to MultiIndex
activities.columns = activities.columns.str.split('_', expand=True)

#reshape DataFrame and filter by is with id in inner merge
activities = (activities.stack(0)
                        .rename_axis([None, 'type'])
                        .reset_index(level=1)
                        .merge(user['id']))
print (activities)
      type        date       id
0  receive   2-10-2020   abab24
1  receive   2-10-2020   abab24
2     sent   2-10-2020   abab15
3     sent  11-10-2020   abab15
4  receive   4-10-2020   abab21
5     sent   4-10-2020   abab25
6     sent   5-10-2020   abab23
7  receive  10-10-2020  abab252
8     sent  10-10-2020   abab22
9  receive  11-10-2020   abab30


#get counts by crosstab
df = pd.crosstab([activities['date'], activities['id']], activities['type'])
print (df)
type                receive  sent
date       id                    
10-10-2020 abab22         0     1
           abab252        1     0
11-10-2020 abab15         0     1
           abab30         1     0
2-10-2020  abab15         0     1
           abab24         2     0
4-10-2020  abab21         1     0
           abab25         0     1
5-10-2020  abab23         0     1

这篇关于在保持日期列完整的同时执行 pandas 聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆