如何在 pandas 数据框中进行时间分类 [英] How to bin time in a pandas dataframe
问题描述
我在不同的时间和不同的日期测量了X,将每日结果保存到以下格式的数据框中:
时间戳记(datetime64)X
0 2015-10-05 00:01:38 1
1 2015-10-05 06:03:39 4
2 2015-10-05 13:42:39 3
3 2015-10-05 22:15:39 2
时间我每天都在变化时进行测量,因此我决定使用分级来整理数据,然后计算出每个分级的平均值和STD,然后可以进行绘制。我的想法是创建一个最终的数据框,其中包含bin和用于测量的X平均值, Observations列仅用于帮助理解:
时间仓观察< X>
0 00:00-05:59 [1,...] 2.3
1 06:00-11:59 [4,...] 4.6
2 12 :00-17:59 [3,...] 8.5
3 18:00-23:59 [2,...] 3.1
但是我遇到了时间,日期时间,datetime64,timedelta和使用 pd.cut
和<$ c $分仓的不兼容问题。 c> pd.groupby ,基本上,我觉得我在暗中刺刺,不知道解决此问题的正确方法。我能想到的唯一解决方案是遍历数据帧的逐行迭代,但我真的很想避免这样做。
每当我按时间范围对时间序列数据进行分箱(这似乎是您在此处所做的事情)时,我都会创建一个小时列并对其进行切片。另外,我通常将索引设置为日期时间值...尽管这里没有必要。
#假设您的时间戳列标记为ts:
df ['hod'] = [r.hour对于df.ts中的r]
#现在,您可以计算每个bin的统计信息
ave = df [(df.hod> = 0)& (df.hod< 6)] .mean()
我认为有一种使用方法此处是df.resample,但由于您的时间序列中起点/终点定义不明确,我认为这可能需要比上述方法更多的关注。
这是您想要的吗?
I am trying to analyze average daily fluctuations in a measurement "X" over several weeks using pandas dataframes, however timestamps/datetimes etc. are proving particularly hellish to deal with. Having spent a good few hours trying to work this out my code is getting messier and messier and I don't think I'm any closer to a solution, hoping someone here can guide me in the right direction.
I have measured X at different times and on different days, saving the daily results to a dataframe which has the form:
Timestamp(datetime64) X
0 2015-10-05 00:01:38 1
1 2015-10-05 06:03:39 4
2 2015-10-05 13:42:39 3
3 2015-10-05 22:15:39 2
As the time the measurement is made at changes from day to day I decided to use binning to organize the data, and then work out averages and STD for each bin which I can then plot. My idea was to create a final dataframe with bins and the average value of X for the measurements, the 'Observations' column is just to aid understanding:
Time Bin Observations <X>
0 00:00-05:59 [ 1 , ...] 2.3
1 06:00-11:59 [ 4 , ...] 4.6
2 12:00-17:59 [ 3 , ...] 8.5
3 18:00-23:59 [ 2 , ...] 3.1
However I've run into difficulties with incompatibility between time, datetime, datetime64, timedelta and binning using pd.cut
and pd.groupby
, basically I feel like I'm making stabs in the dark with no idea as to the the 'right' way to approach this problem. The only solution I can think of is a row-by-row iteration through the dataframe but I'd really like to avoid having to do this.
Whenever I bin time series data by a time range, which seems to be what you are doing here, I just create an "hour of day" column and slice over that. Also, I normally set the index as datetime values...though that is not necessary here.
# assuming your "timestamp" column is labeled ts:
df['hod'] = [r.hour for r in df.ts]
# now you can calculate stats for each bin
ave = df[ (df.hod>=0) & (df.hod<6) ].mean()
I would think there is a method of using df.resample here, but with the poorly defined starting/ending points in your time series I think this may require more attention than the above method.
Is this along the lines of what you were wanting?
这篇关于如何在 pandas 数据框中进行时间分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!