蟒蛇 pandas - 根据DataFrame的列中的某些值计算十行 [英] python & pandas- Calculation bewteen rows based on certain values in columns from DataFrame

查看:187
本文介绍了蟒蛇 pandas - 根据DataFrame的列中的某些值计算十行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大型DataFrame(称为df_NoMissing)与数千行,我需要与他们进行计算和分析。

I have a large DataFrame (called df_NoMissing) with thousands of rows, and I need to do calculation and analysis with them.

   NoDemande   NoUsager  Sens  IdVehiculeUtilise  Fait  HeureArriveeSurSite   HeureEffective      Periods
0 42196000013  000001    +         287Véh          1  11/07/2015 08:02:07  11/07/2015 08:02:13    Matin
1 42196000013  000001    -         287Véh          1  11/07/2015 08:17:09  11/07/2015 08:17:13    Matin
2 42196000002  000314    +         263Véh          1  11/07/2015 09:37:43  11/07/2015 09:53:37    Matin
3 42196000016  002372    +         287Véh          1  11/07/2015 09:46:42  11/07/2015 10:01:39    Matin
4 42196000015  000466    +         287Véh          1  11/07/2015 09:46:42  11/07/2015 10:01:39    Matin
5 42196000002  000314    -         263Véh          1  11/07/2015 10:25:17  11/07/2015 10:38:11    Matin
6 42196000015  000466    -         287Véh          1  11/07/2015 10:48:51  11/07/2015 10:51:30    Matin
7 42196000016  002372    -         287Véh          1  11/07/2015 11:40:56  11/07/2015 11:41:01    Matin
8 42196000004  002641    +         263Véh          1  11/07/2015 13:39:29  11/07/2015 13:52:50    Soir
9 42196000004  002641    -         263Véh          1  11/07/2015 13:59:56  11/07/2015 14:07:41    Soir  

我想做的是在列中具有相同值的两行 NoDemande NoUsager 期间但列 Sens 在列 HeureArriveeSurSite HeureEffective 之间进行减法。而且因为结果与当前DataFrame不符,所以结果将被保存在一个新的DataFrame中

What I want to do is to have two rows with the same value in the column NoDemande, NoUsager, Periods but different in column Sens do the subtraction between column HeureArriveeSurSite and HeureEffective. And because the result doesn't correspond to current DataFrame, so the result will be saved in a new DataFrame

通过识别 Sens 来分离DataFrame,所以我可以直接减法。但是它根本不起作用。

I tried to separate the DataFrame by identifying Sens so I could to the subtraction directly. But it doesn't work at all.

df_new = pd.DataFrame(columns=['NoDemande', 'NoUsager', 'Periods', 'DureeTrajet']
df1 = df_NoMissing[(df_NoMissing['Sens'] == '+') & (df_NoMissing['Periods'] == 'Matin')]
df2 = df_NoMissing[(df_NoMissing['Sens'] == '-') & (df_NoMissing['Periods'] == 'Matin')] 

df_new['DureeTrajet'] = df2['HeureArriveeSurSite'].values-df1['HeureEffective'].values

这一个返回: ValueError:操作数不能与形状一起广播(1478,)(1479,)

我也尝试加载的方式告诉我每次想要什么:

I also tried the loaded way by telling exactly what I want each time:

df1.loc[df1['NoDemande'] == '42196000015','HeureEffective'] - df2.loc[df2['NoDemande'] == '42196000015','HeureArriveeSurSite']

但是这个回来了:

4   NaT
6   NaT
dtype: timedelta64[ns]

那么我该怎么办要获得我想要的东西?

So what should I do to get what I want?

编辑

输出将如下所示:

  NoDemande   NoUsager   Periods   DureeTrajet
0 42196000013  000001     Matin     00:14:54
1 42196000002  000314     Matin     00:31:40
2 42196000016  002372     Matin     00:39:23
3 42196000015  000466     Matin     00:47:12
4 42196000004  002641     Soir      00:07:06

任何帮助将非常感谢〜

推荐答案

好的,从您的DF开始提供 - 让我们在分组列上创建一个索引,并将其转到 Sens action:

Okay, starting with your DF as provided - let's create an index on the grouping columns and pivot to columns for the Sens action:

temp = df.set_index(['NoDemande', 'NoUsager', 'Periods']).pivot(columns='Sens')

然后 - 我们采取适当的差异(根据您的代码):

Then - we take the appropriate difference (as according to your code):

duration = (temp['HeureArriveeSurSite', '-'] - temp['HeureEffective', '+']).to_frame(name='DureeTrajet').reset_index()

那就给你:

     NoDemande  NoUsager Periods  DureeTrajet
0  42196000002       314   Matin     00:31:40
1  42196000004      2641    Soir     00:07:06
2  42196000013         1   Matin     00:14:56
3  42196000015       466   Matin     00:47:12
4  42196000016      2372   Matin     01:39:17

这篇关于蟒蛇 pandas - 根据DataFrame的列中的某些值计算十行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆