按 pandas 数据框分组的意思 [英] Mean of a grouped-by pandas dataframe
问题描述
我需要计算colums持续时间的每天平均值,以及值为== 1且值为0的
行的km。
<$ p $
出发[20]:
日期持续时间km值
0 2015-03-28 09:07:00.800001 0 0 0
1 2015 -03-28 09:36:01.819998 1 2 1
2 2015-03-30 09:36:06.839997 1 3 1
3 2015-03-30 09:37:27.659997 nan 5 0
4 2015-04-22 09:51:40.440003 3 7 0
5 2015-04-23 10:15:25.080002 0 nan 1
如何修改此解决方案以获得平均值duration_value0,duration_value1,km_value0和km_value1?
df = df.set_index('Date')。groupby(pd.Grouper(freq ='d'))。mean()。dropna(how ='all')
print df)
持续时间公里
日期
2015-03-28 0.5 1.0
2015-03-30 1.5 4.0
2015-04-22 3.0 7.0
2015年4月23日0.0 0.0
/ p>
df.pivot_table(values = ['duration','km'],columns = ['value'],index = df ['Date']。dt.date,aggfunc ='mean')
输出:
持续时间公里
值0 1 0 1
日期
2015-03-28 0.0 1.0 0.0 2.0
2015-03-30 NaN 1.0 5.0 3.0
2015-04-22 3.0 NaN 7.0 NaN
2015-04-23 NaN 0.0 NaN NaN
In [24]:
如果您想要新的列名称,例如distance0,distance1 ...您可以使用列表理解,即如果将数据透视表存储在 ndf
ndf.columns = [i [0] + str(i [1])为我在ndf.columns]
输出:
duration0 duration1 km0 km1
日期
2015-03-28 0.0 1.0 0.0 2.0
2015-03-30 NaN 1.0 5.0 3.0
2015-04-22 3.0 NaN 7.0 NaN
2015-04 -23 NaN 0.0 NaN NaN
I need to calculate the mean per day of the colums duration and km for the rows with value ==1 and values = 0.
df
Out[20]:
Date duration km value
0 2015-03-28 09:07:00.800001 0 0 0
1 2015-03-28 09:36:01.819998 1 2 1
2 2015-03-30 09:36:06.839997 1 3 1
3 2015-03-30 09:37:27.659997 nan 5 0
4 2015-04-22 09:51:40.440003 3 7 0
5 2015-04-23 10:15:25.080002 0 nan 1
how can I modify this solution in order to have the means duration_value0, duration_value1, km_value0 and km_value1?
df = df.set_index('Date').groupby(pd.Grouper(freq='d')).mean().dropna(how='all')
print (df)
duration km
Date
2015-03-28 0.5 1.0
2015-03-30 1.5 4.0
2015-04-22 3.0 7.0
2015-04-23 0.0 0.0
I think you are looking pivot table i.e
df.pivot_table(values=['duration','km'],columns=['value'],index=df['Date'].dt.date,aggfunc='mean')
Output:
duration km value 0 1 0 1 Date 2015-03-28 0.0 1.0 0.0 2.0 2015-03-30 NaN 1.0 5.0 3.0 2015-04-22 3.0 NaN 7.0 NaN 2015-04-23 NaN 0.0 NaN NaN In [24]:
If you want the new column names like distance0,distance1 ... You can use list comprehension i.e if you store the pivot table in ndf
ndf.columns = [i[0]+str(i[1]) for i in ndf.columns]
Output:
duration0 duration1 km0 km1 Date 2015-03-28 0.0 1.0 0.0 2.0 2015-03-30 NaN 1.0 5.0 3.0 2015-04-22 3.0 NaN 7.0 NaN 2015-04-23 NaN 0.0 NaN NaN
这篇关于按 pandas 数据框分组的意思的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!