如何为 pandas 数据框中的不同组分配唯一的ID? [英] How to assign a unique ID for different groups in pandas dataframe?

查看:57
本文介绍了如何为 pandas 数据框中的不同组分配唯一的ID?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何根据某些条件将唯一ID分配给在pandas数据框中创建的组. 例如:我有一个名为df的数据框,其结构如下:名称标识用户,datetime标识用户访问资源的日期/时间.

How to assign unique IDs to groups created in pandas dataframe based on certain conditions. For example: I have a dataframe named as df with the following structure:Name identifies the user, and datetime identifies the date/time at which the user is accessing a resource.

Name         Datetime 
Bob          26-04-2018 12:00:00 
Claire       26-04-2018 12:00:00 
Bob          26-04-2018 12:10:00 
Bob          26-04-2018 12:30:00 
Grace        27-04-2018 08:30:00 
Bob          27-04-2018 09:30:00 
Bob          27-04-2018 09:40:00 
Bob          27-04-2018 10:00:00 
Bob          27-04-2018 10:30:00 
Bob          27-04-2018 11:30:00

我想为用户创建会话,以便为具有相同名称和日期时间值的用户访问资源的时间不超过30分钟,将为其分配一个唯一的会话.但是,如果用户在访问资源时显示超过30分钟的不活动状态,则将在下一次用户访问资源时为同一用户分配不同的会话.

I would like to create sessions for the users such that, users with same name and datetime values accessing the resource do not exceed more than 30 minutes would be assigned a unique session. However, if the user shows some inactivity for more than 30 minutes in accessing the resource, the same user would be assigned a different session for the next time user access the resource.

我的预期输出将如下所示.

My expected output would be as shown.

用户Bob在2018年4月27日访问的资源是9.30,第二次@ 9.40,第三次@ 10.00,第四次@ 10.30->所有都与会话4关联.但是下一次用户Bob在@ 11.30上访问时差很大由于Bob闲置了30分钟以上,因此超过了30分钟,因此下一次会话将分配给他.

User Bob on 27-04-2018, accessed the resource at 9.30, second time @ 9.40, third time @ 10.00, fourth time @10.30 -> all with Session 4. But next time user Bob access @ 11.30 so time difference exceeds 30 minutes as Bob has been inactive for more than 30 minutes, so next session would be assigned to him.

Name         Datetime                    Id
Bob          26-04-2018 12:00:00          1
Claire       26-04-2018 12:00:00          2
Bob          26-04-2018 12:10:00          1
Bob          26-04-2018 12:30:00          1
Grace        27-04-2018 08:30:00          3
Bob          27-04-2018 09:30:00          4
Bob          27-04-2018 09:40:00          4
Bob          27-04-2018 10:00:00          4
Bob          27-04-2018 10:30:00          4
Bob          27-04-2018 11:30:00          5

谢谢您的帮助! 链接到上一个问题:如何在熊猫数据框中将第二列的值与第一列的值进行比较?

Thank you for your help! Link to previous question: How to compare value of second column with same values of first column in pandas dataframe?

推荐答案

您在底部的解释对理解它很有帮助.

Your explanation at the near bottom is really helpful to understand it.

您需要在NamegroupID上进行分组(不要将此groupID与最终的Id混淆),然后调用ngroup返回Id.最主要的是如何定义此groupID.要创建groupID,您需要sort_values将每个NameDatetime分成升序.对Name进行分组,并找到每组Name(在同一Name中)的连续行之间的Datetime差异.使用gt检查大于30分钟,使用cumsum检查groupID. sort_index还原为原始顺序并分配给s,如下所示:

You need to groupby on Name and a groupID (don't confuse this groupID with your final Id) and call ngroup to return Id. The main thing is how to define this groupID. To create groupID, you need sort_values to separate each Name and Datetime into ascending order. Groupby Name and find differences in Datetime between consecutive rows within each group of Name (within the same Name). Using gt to check greater than 30mins and cumsum to get groupID. sort_index to reverse back to original order and assign to s as follows:

s = df.sort_values(['Name','Datetime']).groupby('Name').Datetime.diff() \
      .gt(pd.Timedelta(minutes=30)).cumsum().sort_index()

接下来,将Namessort=False进行分组以保留原始顺序并调用ngroup加1.

Next, groupby Name and s with sort=False to reserve the original order and call ngroup plus 1.

df['Id'] = df.groupby(['Name', s], sort=False).ngroup().add(1)

Out[834]:
     Name            Datetime  Id
0     Bob 2018-04-26 12:00:00   1
1  Claire 2018-04-26 12:00:00   2
2     Bob 2018-04-26 12:10:00   1
3     Bob 2018-04-26 12:30:00   1
4   Grace 2018-04-27 08:30:00   3
5     Bob 2018-04-27 09:30:00   4
6     Bob 2018-04-27 09:40:00   4
7     Bob 2018-04-27 10:00:00   4
8     Bob 2018-04-27 10:30:00   4
9     Bob 2018-04-27 11:30:00   5

这篇关于如何为 pandas 数据框中的不同组分配唯一的ID?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆