如何在 pandas 数据框中进行时间分类 [英] How to bin time in a pandas dataframe

查看：76 发布时间：2020/10/18 22:37:08 python pandas datetime pandas-groupby

本文介绍了如何在 pandas 数据框中进行时间分类的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试分析度量 X中的平均每日波动。使用pandas数据框花费了数周的时间，但是时间戳/日期时间等却被证明特别难以处理。花了好几个小时来解决这个问题，我的代码越来越混乱了，我认为我离解决方案还差得很远，希望这里的人可以指导我正确的方向。

我在不同的时间和不同的日期测量了X，将每日结果保存到以下格式的数据框中：

 时间戳记（datetime64）X 
 
 0 2015-10-05 00:01:38 1 
 1 2015-10-05 06:03:39 4 
 2 2015-10-05 13:42:39 3 
 3 2015-10-05 22:15:39 2

时间我每天都在变化时进行测量，因此我决定使用分级来整理数据，然后计算出每个分级的平均值和STD，然后可以进行绘制。我的想法是创建一个最终的数据框，其中包含bin和用于测量的X平均值， Observations列仅用于帮助理解：

 时间仓观察< X> 
 
 0 00：00-05：59 [1，...] 2.3 
 1 06：00-11：59 [4，...] 4.6 
 2 12 ：00-17：59 [3，...] 8.5 
 3 18：00-23：59 [2，...] 3.1

但是我遇到了时间，日期时间，datetime64，timedelta和使用 pd.cut 和<$ c $分仓的不兼容问题。 c> pd.groupby ，基本上，我觉得我在暗中刺刺，不知道解决此问题的正确方法。我能想到的唯一解决方案是遍历数据帧的逐行迭代，但我真的很想避免这样做。

解决方案

每当我按时间范围对时间序列数据进行分箱（这似乎是您在此处所做的事情）时，我都会创建一个小时列并对其进行切片。另外，我通常将索引设置为日期时间值...尽管这里没有必要。

 ＃假设您的时间戳列标记为ts：
 df ['hod'] = [r.hour对于df.ts中的r] 
 
＃现在，您可以计算每个bin的统计信息
 ave = df [（df.hod> = 0）& （df.hod< 6）] .mean（）

我认为有一种使用方法此处是df.resample，但由于您的时间序列中起点/终点定义不明确，我认为这可能需要比上述方法更多的关注。

这是您想要的吗？

I am trying to analyze average daily fluctuations in a measurement "X" over several weeks using pandas dataframes, however timestamps/datetimes etc. are proving particularly hellish to deal with. Having spent a good few hours trying to work this out my code is getting messier and messier and I don't think I'm any closer to a solution, hoping someone here can guide me in the right direction.

I have measured X at different times and on different days, saving the daily results to a dataframe which has the form:

    Timestamp(datetime64)         X 

0    2015-10-05 00:01:38          1
1    2015-10-05 06:03:39          4 
2    2015-10-05 13:42:39          3
3    2015-10-05 22:15:39          2

As the time the measurement is made at changes from day to day I decided to use binning to organize the data, and then work out averages and STD for each bin which I can then plot. My idea was to create a final dataframe with bins and the average value of X for the measurements, the 'Observations' column is just to aid understanding:

        Time Bin       Observations     <X>  

0     00:00-05:59      [ 1 , ...]       2.3
1     06:00-11:59      [ 4 , ...]       4.6
2     12:00-17:59      [ 3 , ...]       8.5
3     18:00-23:59      [ 2 , ...]       3.1

However I've run into difficulties with incompatibility between time, datetime, datetime64, timedelta and binning using pd.cut and pd.groupby, basically I feel like I'm making stabs in the dark with no idea as to the the 'right' way to approach this problem. The only solution I can think of is a row-by-row iteration through the dataframe but I'd really like to avoid having to do this.

解决方案

Whenever I bin time series data by a time range, which seems to be what you are doing here, I just create an "hour of day" column and slice over that. Also, I normally set the index as datetime values...though that is not necessary here.

# assuming your "timestamp" column is labeled ts: 
df['hod'] = [r.hour for r in df.ts]

# now you can calculate stats for each bin
ave = df[ (df.hod>=0) & (df.hod<6) ].mean()

I would think there is a method of using df.resample here, but with the poorly defined starting/ending points in your time series I think this may require more attention than the above method.

Is this along the lines of what you were wanting?

这篇关于如何在 pandas 数据框中进行时间分类的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在 pandas 数据框中进行时间分类 [英] How to bin time in a pandas dataframe

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在 pandas 数据框中进行时间分类 [英] How to bin time in a pandas dataframe

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭