蟒蛇 pandas - 根据DataFrame的列中的某些值计算十行 [英] python & pandas- Calculation bewteen rows based on certain values in columns from DataFrame
问题描述
我有一个大型DataFrame(称为df_NoMissing)与数千行,我需要与他们进行计算和分析。
I have a large DataFrame (called df_NoMissing) with thousands of rows, and I need to do calculation and analysis with them.
NoDemande NoUsager Sens IdVehiculeUtilise Fait HeureArriveeSurSite HeureEffective Periods
0 42196000013 000001 + 287Véh 1 11/07/2015 08:02:07 11/07/2015 08:02:13 Matin
1 42196000013 000001 - 287Véh 1 11/07/2015 08:17:09 11/07/2015 08:17:13 Matin
2 42196000002 000314 + 263Véh 1 11/07/2015 09:37:43 11/07/2015 09:53:37 Matin
3 42196000016 002372 + 287Véh 1 11/07/2015 09:46:42 11/07/2015 10:01:39 Matin
4 42196000015 000466 + 287Véh 1 11/07/2015 09:46:42 11/07/2015 10:01:39 Matin
5 42196000002 000314 - 263Véh 1 11/07/2015 10:25:17 11/07/2015 10:38:11 Matin
6 42196000015 000466 - 287Véh 1 11/07/2015 10:48:51 11/07/2015 10:51:30 Matin
7 42196000016 002372 - 287Véh 1 11/07/2015 11:40:56 11/07/2015 11:41:01 Matin
8 42196000004 002641 + 263Véh 1 11/07/2015 13:39:29 11/07/2015 13:52:50 Soir
9 42196000004 002641 - 263Véh 1 11/07/2015 13:59:56 11/07/2015 14:07:41 Soir
我想做的是在列中具有相同值的两行 NoDemande
, NoUsager
,期间
但列 Sens
在列 HeureArriveeSurSite
和 HeureEffective
之间进行减法。而且因为结果与当前DataFrame不符,所以结果将被保存在一个新的DataFrame中
What I want to do is to have two rows with the same value in the column NoDemande
, NoUsager
, Periods
but different in column Sens
do the subtraction between column HeureArriveeSurSite
and HeureEffective
. And because the result doesn't correspond to current DataFrame, so the result will be saved in a new DataFrame
通过识别 Sens
来分离DataFrame,所以我可以直接减法。但是它根本不起作用。
I tried to separate the DataFrame by identifying Sens
so I could to the subtraction directly. But it doesn't work at all.
df_new = pd.DataFrame(columns=['NoDemande', 'NoUsager', 'Periods', 'DureeTrajet']
df1 = df_NoMissing[(df_NoMissing['Sens'] == '+') & (df_NoMissing['Periods'] == 'Matin')]
df2 = df_NoMissing[(df_NoMissing['Sens'] == '-') & (df_NoMissing['Periods'] == 'Matin')]
df_new['DureeTrajet'] = df2['HeureArriveeSurSite'].values-df1['HeureEffective'].values
这一个返回: ValueError:操作数不能与形状一起广播(1478,)(1479,)
我也尝试加载的方式告诉我每次想要什么:
I also tried the loaded way by telling exactly what I want each time:
df1.loc[df1['NoDemande'] == '42196000015','HeureEffective'] - df2.loc[df2['NoDemande'] == '42196000015','HeureArriveeSurSite']
但是这个回来了:
4 NaT
6 NaT
dtype: timedelta64[ns]
那么我该怎么办要获得我想要的东西?
So what should I do to get what I want?
编辑
输出将如下所示:
NoDemande NoUsager Periods DureeTrajet
0 42196000013 000001 Matin 00:14:54
1 42196000002 000314 Matin 00:31:40
2 42196000016 002372 Matin 00:39:23
3 42196000015 000466 Matin 00:47:12
4 42196000004 002641 Soir 00:07:06
任何帮助将非常感谢〜
推荐答案
好的,从您的DF开始提供 - 让我们在分组列上创建一个索引,并将其转到 Sens
action:
Okay, starting with your DF as provided - let's create an index on the grouping columns and pivot to columns for the Sens
action:
temp = df.set_index(['NoDemande', 'NoUsager', 'Periods']).pivot(columns='Sens')
然后 - 我们采取适当的差异(根据您的代码):
Then - we take the appropriate difference (as according to your code):
duration = (temp['HeureArriveeSurSite', '-'] - temp['HeureEffective', '+']).to_frame(name='DureeTrajet').reset_index()
那就给你:
NoDemande NoUsager Periods DureeTrajet
0 42196000002 314 Matin 00:31:40
1 42196000004 2641 Soir 00:07:06
2 42196000013 1 Matin 00:14:56
3 42196000015 466 Matin 00:47:12
4 42196000016 2372 Matin 01:39:17
这篇关于蟒蛇 pandas - 根据DataFrame的列中的某些值计算十行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!