麻烦平均Pandas DataFrame中的特定行 [英] Trouble averaging specific rows in Pandas DataFrame
问题描述
我目前拥有以下两个DataFrames:
I currently have the following two DataFrames:
raw_data =
Time F1 F2 F3
2082-05-03 00:00:59.961599999 -83.769997 29.430000 29.400000
2082-05-03 00:02:00.009600000 -84.209999 28.940001 28.870001
2082-05-03 00:02:59.971200000 -84.339996 28.280001 28.320000
outage_by_timeofday_num = (由raw_data生成)(忽略破折号 - 仅用于对齐)
outage_by_timeofday_num = (made from raw_data) (ignore dashes - they are for alignment only)
F1 F2 F3
Time
2082-05-03 00:00:00 0 1 1
2082-05-03 01:00:00 0 1 1
我已经能够使用下面的代码(下面)对一天中的raw_data DataFrame进行排序和平均,但是我无法做到这一点他与outage_by_timeofday_num DataFrame相同:
I've been able to sort and average the raw_data DataFrame by times of the day using the following code (below), but I'm unable to do the same with the outage_by_timeofday_num DataFrame:
这样做:
raw_data = pd.read_excel(r'/Users/linnk ....
raw_data[u'Time']= pd.to_datetime(raw_data['Time'], unit='d')
raw_data.set_index(pd.DatetimeIndex(raw_data[u'Time']), inplace=True)
raw_data.Time = pd.to_datetime(raw_data.Time)
def time_cat(t):
hour = t.hour
if(hour >= 5 and hour < 9):
return 'Morning (5AM-9AM)'
elif(hour >= 9 and hour < 18):
return 'Day (9AM-6PM)'
elif(hour >= 18 and hour < 22):
return 'Evening (6PM-10PM)'
else:
return 'Night (10PM-5AM)'
by_timeofday = raw_data.groupby(raw_data.Time.apply(time_cat)).mean()
和by_timeofday输出是:
and the by_timeofday output is:
F1 F2 F3
Time
Day (9AM-6PM) -47.301852 23.070963 22.981000
Evening (6PM-10PM) -50.033000 24.011667 23.921833
Morning (5AM-9AM) -62.481130 48.417866 48.537197
Night (10PM-5AM) -71.372613 -71.289763 53.957411 \
然而,这不起作用:
outage_by_hour_num.Time= pd.to_datetime(outage_by_hour_num.Time)
outage_by_timeofday = outage_by_hour_num.groupby(outage_by_hour_num.Time.apply(time_cat)).sum(axis=1, numeric_only=True)
这给出错误: AttributeError:'DataFrame'对象没有属性'Time'
有人可以帮助我找到我的错误/编辑我需要排序我的outage_by_timeofday_num DataFrame以同样的方式排序raw_data?
如果可能有用,则以以下方式执行outage_by_timeofday_num:
Can someone help me spot my error/the edit I need to make to sort my outage_by_timeofday_num DataFrame in the same way I sorted raw_data? In case it might be useful, outage_by_timeofday_num has been made in the following way:
ave_data = raw_data.resample('h', how='mean')
ave_data.index.name=u'Time'
summary_ave_data = ave_data.copy()
summary_ave_data['Hourly Substation Average'] = summary_ave_data.mean(numeric_only=True, axis=1)
outage_by_hour = summary_ave_data >= 0.05
outage_by_hour_num= outage_by_hour.astype(int)
推荐答案
你在中删除了'Time'列ave_data.index.name = u'Time' code>。
更改它:
You got rid of your 'Time' column in ave_data.index.name=u'Time'
.
Change it with:
ave_data.set_index('Time', drop=False, inplace=True)
确保将其设置为索引,但保留时间列。
That makes sure you set it as an index, but you keep the 'Time' column.
这篇关于麻烦平均Pandas DataFrame中的特定行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!