如何在“日时段"内覆盖数据?在 pandas 进行密谋 [英] How to overlay data over a "day period" in Pandas for plotting

查看:67
本文介绍了如何在“日时段"内覆盖数据?在 pandas 进行密谋的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个DataFrame,其中有一些( more-sensical )数据,格式如下:

I have a DataFrame with some (more-sensical) data in the following form:

In[67] df
Out[67]: 
                             latency
timestamp                           
2016-09-15 00:00:00.000000  0.042731
2016-09-15 00:16:24.376901  0.930874
2016-09-15 00:33:19.268295  0.425996
2016-09-15 00:51:30.956065  0.570245
2016-09-15 01:09:23.905364  0.044203
                             ...
2017-01-13 13:08:31.707328  0.071137
2017-01-13 13:25:41.154199  0.322872
2017-01-13 13:38:19.732391  0.193918
2017-01-13 13:57:36.687049  0.999191

所以它跨越大约50天,并且每天的同一时间都不是时间戳记.我想每天叠加一些图,即在同一图上检查每天的时间序列. 50天的行可能太多了,但是我认为我想研究一种每日季节性",这似乎是有用的可视化形式,而不是更严格的定义.

So it spans about 50 days, and the timestamps are not at the same time every day. I would like to overlay some plots for each day, that is, inspect the time series of each day on the same plot. 50 days may be too many lines, but I think there is a kind of "daily seasonality" which I would like to investigate, and this seems like a useful visualization before anything more rigorous.

如何在代表单日"时间段的同一地块上叠加此数据??

我的想法

我对Pandas不太熟悉,但是我设法通过

I am not yet very familiar with Pandas, but I managed to group my data into daily bins with

In[67]: df.groupby(pd.TimeGrouper('D'))
Out[68]: <pandas.core.groupby.DataFrameGroupBy object at 0x000000B698CD34E0>

现在,我一直在尝试确定应该如何创建新的DataFrame结构,以便可以按日覆盖绘图.这是我不知道的基本问题-如何使用DataFrameGroupBy对象覆盖图?一种非常基本的方法是对每个GroupBy对象进行迭代,但是这样做的问题是配置x轴,使其仅显示独立于特定日期的每日时间段",而不是捕获整个时间戳.

Now I've been trying to determine how I am supposed to create a new DataFrame structure such that the plots can be overlayed by day. This the fundamental thing I can't figure out - how can I utilize a DataFrameGroupBy object to overlay the plots? A very rudimentary-seeming approach would be to just iterate over each GroupBy object, but my issue with doing so has been configuring the x-axis such that it only displays a "daily time period" independent of the particular day, instead of capturing the entire timestamp.

将数据拆分为单独的帧,并使用某种日期强制在同一图中调用它们,以在更多内容中使用方法一般答案对我来说似乎不是很好.

Splitting the data up into separate frames and calling them in the same figure with some kind of date coercion to use the approach in this more general answer doesn't seem very good to me.

您可以类似地生成伪数据,如下所示:

You can generate pseudo-data similarly with something like this:

import datetime 

start_date = datetime.datetime(2016, 9, 15)
end_date = datetime.datetime.now()

dts = []
cur_date = start_date
while cur_date < end_date:
    dts.append((cur_date, np.random.rand()))
    cur_date = cur_date + datetime.timedelta(minutes=np.random.uniform(10, 20))

推荐答案

考虑数据框df(主要由OP提供的代码生成)

Consider the dataframe df (generated mostly from OP provided code)

import datetime 

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

start_date = datetime.datetime(2016, 9, 15)
end_date = datetime.datetime.now()

dts = []
cur_date = start_date
while cur_date < end_date:
    dts.append((cur_date, np.random.rand()))
    cur_date = cur_date + datetime.timedelta(minutes=np.random.uniform(10, 20))


df = pd.DataFrame(dts, columns=['Date', 'Value']).set_index('Date')


真正的诀窍是将索引分为日期和时间部分,然后再进行堆栈处理.然后进行插值以填充缺失的值


The real trick is splitting the index into date and time components and unstacking. Then interpolate to fill in missing values

d1 = df.copy()
d1.index = [d1.index.time, d1.index.date]
d1 = d1.Value.unstack().interpolate()

从这里我们可以d1.plot(legend=0)

ax = d1.plot(legend=0)
ax.figure.autofmt_xdate()

但这不是很有帮助.

您可以尝试类似的方法...希望对您有所帮助

You might try something like this... hopefully this helps

n, m = len(d1.columns) // 7 // 4 + 1, 4
fig, axes = plt.subplots(n, m, figsize=(10, 15), sharex=False)

for i, (w, g) in enumerate(d1.T.groupby(pd.TimeGrouper('W'))):
    r, c = i // m, i % m
    ax = g.T.plot(ax=axes[r, c], title=w, legend=0)

fig.autofmt_xdate()

如何在数周内完成操作

How to do it over weeks

  • 创建多索引
    • 包括代表星期的时间段
    • 包括星期几
    • 包括一天中的时间
    • create a multi index
      • include the period representing the week
      • include the day of the week
      • include the time of day
      d2 = df.copy()
      
      idx = df.index
      d2.index = [idx.weekday_name, idx.time, idx.to_period('W').rename('Week')]
      
      ax = d2.Value.unstack().interpolate().iloc[:, :2].plot()
      ax.figure.autofmt_xdate()
      

      这篇关于如何在“日时段"内覆盖数据?在 pandas 进行密谋的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆