如何使用python中的两列计算直到改变的过程的总长度? [英] How to calculate total length of process until it changes using two columns in python?

查看:114
本文介绍了如何使用python中的两列计算直到改变的过程的总长度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个数据帧的片段,看起来像这样(原始数据帧包含8k行):

Here is a snippet of data-frame which looks like this (original data frame contains 8k rows):

     User   State      change_datetime  endstate
0  100234     XIM  2016-01-19 17:03:12  Inactive
1  100234  Active  2016-01-28 17:17:15       XIM
2  100234  Active  2016-02-16 17:57:50       NaN
3  100234    Live  2016-02-16 17:58:51    Active
4  213421     XIM  2016-02-16 17:57:53       NaN
5  213421  Active  2018-02-01 10:01:51       XIM
6  213421  Active  2018-02-01 20:49:41       NaN
7  213421  Active  2018-02-13 20:40:11       NaN
8  213421       R  2018-03-04 05:38:51    Active
9  612312    B-98  2018-11-01 17:12:11       XIM

我试图找出每个唯一用户在活动状态下花费多长时间,直到他们变为活动以外的其他状态。有一个'endstate'列,其中包含'Active'值,因此我想计算从'State'列开始为'Active'到'endstate'列包含'Active'的总时间差。

I'm trying to find out how long each unique User spends in an 'Active' state till they change into a different state other than 'Active'. There is an 'endstate' column which contains an 'Active' value, So I wanted to calculate the total time difference from when the 'State' column starts as 'Active' until the 'endstate' column contains 'Active'

我最初使用下面的代码:

Originally, I used the following code:

df["change_datetime"] = pd.to_datetime(df["change_datetime"])

cond1 = df["State"].eq("Active")
cond2 = df["State"].shift(-1).ne("Active")

start = df.loc[cond1].groupby("User")["change_datetime"].first()
end = df.loc[cond1&cond2].groupby("User")["change_datetime"].first()

print (end-start)
Active_state_duration = (end-start).to_frame()
Active_state_duration.head()

返回的结果是:

User
100234   19 days 00:40:35
213421   12 days 10:38:20
Name: change_datetime, dtype: timedelta64[ns]

对于用户100234,第2行和第3行计算的是19天00:40:35,但是应该是19天00: 41:36(使用第4行),因为用户需要1分1秒才能从活动转换为实时。

For User 100234, 19 days and 00:40:35 is calculated for Line 2 and 3 however it should be 19 days and 00:41:36 (using Line 4) as it takes the User 1 minute and 1 second to transition from 'Active' to 'Live'.

我希望使用 endstate列在此代码中,以便使用状态列运行活动用户的持续时间,直到下一行代码将活动作为 end_state中的值,并且将活动以外的其他值'州'。
这是我希望如何计算持续时间的示例:

I was hoping to use the 'endstate' column in this code so that the time duration of the User being 'Active' is run using the 'State' column until the next line of code has 'Active' as the value in 'end_state' and a different value other than 'Active' for 'State'. Here is an example of how i'm hoping to calculate the time duration:

有没有办法做到这一点?

Is there a way to do this?

这是我尝试计算持续时间的方法:

Here is how i'm trying to calculate the duration:

推荐答案

使用 Series.eq 创建一个布尔掩码 m ,然后使用此掩码过滤数据框并使用 DataFrame.groupby agg change_datetime 使用 np.ptp

Use Series.eq to create a boolean mask m then filter the dataframe using this mask and use DataFrame.groupby and agg the column change_datetime using np.ptp:

m = df['State'].eq('Active') | df['endstate'].eq('Active')
s = df[m].groupby(['User', (~m).cumsum()])['change_datetime'].agg(np.ptp).droplevel(1)

或,如果始终需要在数据框中考虑每个用户的一次过渡:

OR, if always need to consider one transition per user in the dataframe:

m1 = df['State'].eq('Active')
m2 = ~m1 & df['endstate'].eq('Active')

s1 = df[m1].groupby('User')['change_datetime'].first()
s2 = df[m2].groupby('User')['change_datetime'].first()

s = s2.sub(s1)

结果:

print(s)
User
100234   19 days 00:41:36
213421   30 days 19:37:00
Name: change_datetime, dtype: timedelta64[ns]

这篇关于如何使用python中的两列计算直到改变的过程的总长度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆