matplotlib:绘制时间序列,同时跳过没有数据的时间段 [英] matplotlib: plotting timeseries while skipping over periods without data

查看:219
本文介绍了matplotlib:绘制时间序列,同时跳过没有数据的时间段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

tl; dr:如何在绘制时间序列时跳过没有数据的时间段?

tl;dr: how can I skip over periods where there is no data while plotting timeseries?

我正在进行长时间的计算,我想监视它的进度.有时我打断这个计算.日志存储在一个巨大的CSV文件中,如下所示:

I'm running a long calculation and I'd like to monitor its progress. Sometimes I interrupt this calculation. The logs are stored in a huge CSV file which looks like this:

2016-01-03T01:36:30.958199,0,0,0,startup
2016-01-03T01:36:32.363749,10000,0,0,regular
...
2016-01-03T11:12:21.082301,51020000,13402105,5749367,regular
2016-01-03T11:12:29.065687,51030000,13404142,5749367,regular
2016-01-03T11:12:37.657022,51040000,13408882,5749367,regular
2016-01-03T11:12:54.236950,51050000,13412824,5749375,shutdown
2016-01-03T19:02:38.293681,51050000,13412824,5749375,startup
2016-01-03T19:02:49.296161,51060000,13419181,5749377,regular
2016-01-03T19:03:00.547644,51070000,13423127,5749433,regular
2016-01-03T19:03:05.599515,51080000,13427189,5750183,regular
...

实际上,有41列.每个列都是进度的特定指标.第二列始终以10000为步长递增.最后一列是不言自明的.

In reality, there are 41 columns. Each of the columns is a certain indicator of progress. The second column is always incremented in steps of 10000. The last column is self-explanatory.

我想在同一图形上绘制每列,同时跳过关闭"和启动"之间的时间段.理想情况下,我还要在每次跳过时画一条垂直线.

I would like to plot each column on the same graph while skipping over periods between "shutdown" and "startup". Ideally, I would also like to draw a vertical line on each skip.

这是到目前为止我得到的:

Here's what I've got so far:

import matplotlib.pyplot as plt
import pandas as pd

# < ... reading my CSV in a Pandas dataframe `df` ... >

fig, ax = plt.subplots()

for col in ['total'] + ['%02d' % i for i in range(40)]:
    ax.plot_date(df.index.values, df[col].values, '-')

fig.autofmt_xdate()
plt.show()

我想摆脱那段漫长的平坦期,而只画一条垂直线.

I want to get rid of that long flat period and just draw a vertical line instead.

我了解df.plot(),但是根据我的经验,它已经坏了(除其他事项外,Pandas以自己的格式转换了datetime对象,而不是使用date2numnum2date).

I know about df.plot(), but in my experience it's broken (among other things, Pandas converts datetime objects in its own format instead of using date2num and num2date).

似乎可行的解决方案是编写自定义缩放器,但这似乎很复杂.

It looks like a possible solution is to write a custom scaler, but that seems quite complicated.

据我所知,编写自定义Locator只会更改刻度线的位置(垂直线和相关标签很小),而不会更改图形本身的位置.正确吗?

As far as I understand, writing a custom Locator will only change the positions of ticks (little vertical lines and the associated labels), but not the position of the plot itself. Is that correct?

UPD::一种简单的解决方案是更改时间戳(例如,将它们重新计算为从开始起经过的时间"),但是我宁愿保留它们.

UPD: an easy solution would be to change the timestamps (say, recalculate them to "time elapsed since start"), but I'd prefer to preserve them.

UPD:的答案位于 https://stackoverflow.com/a/5657491/1214547经过一些修改为我工作.我将尽快写出解决方案.

UPD: the answer at https://stackoverflow.com/a/5657491/1214547 works for me with some modifications. I will write up my solution soon.

推荐答案

@Pastafarianist提供了一个很好的解决方案.但是,当我处理多个中断的绘图时,我在InvertedCustomTransform中发现了一个错误.例如,在以下代码中,十字准线无法在第二个和第三个中断处跟随光标.

@Pastafarianist provides a good solution. However, I find a bug in the InvertedCustomTransform when I deal with the plotting with more than one break. For a example, in the following code the cross hair can't follow the cursor over the second and the third breaks.

import bisect
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.scale as mscale
import matplotlib.transforms as mtransforms
import matplotlib.dates as mdates
import pandas as pd
from matplotlib.widgets import Cursor


def CustomScaleFactory(breaks):
    class CustomScale(mscale.ScaleBase):
        name = 'custom'

        def __init__(self, axis, **kwargs):
            mscale.ScaleBase.__init__(self)

        def get_transform(self):
            return self.CustomTransform()

        def set_default_locators_and_formatters(self, axis):
            class HourSkippingLocator(mdates.HourLocator):
                _breaks = breaks

                def __init__(self, *args, **kwargs):
                    super(HourSkippingLocator, self).__init__(*args, **kwargs)

                def _tick_allowed(self, tick):
                    for left, right in self._breaks:
                        if left <= tick <= right:
                            return False
                    return True

                def __call__(self):
                    ticks = super(HourSkippingLocator, self).__call__()
                    ticks = [tick for tick in ticks if self._tick_allowed(tick)
                             ]
                    ticks.extend(right for (left, right) in self._breaks)
                    return ticks

            axis.set_major_locator(HourSkippingLocator(interval=3))
            axis.set_major_formatter(mdates.DateFormatter("%h %d, %H:%M"))

        class CustomTransform(mtransforms.Transform):
            input_dims = 1
            output_dims = 1
            is_separable = True
            has_inverse = True
            _breaks = breaks

            def __init__(self):
                mtransforms.Transform.__init__(self)

            def transform_non_affine(self, a):
                # I have tried to write something smart using np.cumsum(),
                # It may glue together some points, and there is no way
                # to separate them back. This implementation maps both
                # points to the *left* side of the break.

                diff = np.zeros(len(a))

                total_shift = 0

                for left, right in self._breaks:
                    pos = bisect.bisect_right(a, left - total_shift)
                    if pos >= len(diff):
                        break
                    diff[pos] = right - left
                    total_shift += right - left

                return a + diff.cumsum()

            def inverted(self):
                return CustomScale.CustomTransform()

    return CustomScale

# stimulating data
index1 = pd.date_range(start='2016-01-08 9:30', periods=10, freq='30s')
index2 = pd.date_range(end='2016-01-08 15:00', periods=10, freq='30s')
index = index1.union(index2)
data1 = pd.Series(range(20), index=index.values)
index3 = pd.date_range(start='2016-01-09 9:30', periods=10, freq='30s')
index4 = pd.date_range(end='2016-01-09 15:00', periods=10, freq='30s')
index = index3.union(index4)
data2 = pd.Series(range(20), index=index.values)
data = pd.concat([data1, data2])
breaks_dates = [
    pd.datetime.strptime('2016-01-08 9:35:00', '%Y-%m-%d %H:%M:%S'),
    pd.datetime.strptime('2016-01-08 14:55:00', '%Y-%m-%d %H:%M:%S'),
    pd.datetime.strptime('2016-01-08 15:00:00', '%Y-%m-%d %H:%M:%S'),
    pd.datetime.strptime('2016-01-09 9:30:00', '%Y-%m-%d %H:%M:%S'),
    pd.datetime.strptime('2016-01-09 9:35:00', '%Y-%m-%d %H:%M:%S'),
    pd.datetime.strptime('2016-01-09 14:55:00', '%Y-%m-%d %H:%M:%S')
]
breaks_dates = [mdates.date2num(point_i) for point_i in breaks_dates]
breaks = [(breaks_dates[i], breaks_dates[i + 1]) for i in [0, 2, 4]]
fig, ax = plt.subplots()
ax.plot(data.index.values, data.values)
mscale.register_scale(CustomScaleFactory(breaks))
ax.set_xscale('custom')
cursor = Cursor(ax, useblit=True, color='r', linewidth=2)
plt.show()

在此处输入图片描述 如果按如下所示更改"InvertedCustomTransform"类中的"transform_non_affine"函数,则效果很好.

enter image description here If change the 'transform_non_affine' function in the 'InvertedCustomTransform' class as follows it works well.

def transform_non_affine(self, a):
    # Actually, this transformation isn't exactly invertible.
    # It may glue together some points, and there is no way
    # to separate them back. This implementation maps both
    # points to the *left* side of the break.

    diff = np.zeros(len(a))

    total_shift = 0

    for left, right in self._breaks:
        pos = bisect.bisect_right(a, left - total_shift)
        if pos >= len(diff):
            break
        diff[pos] = right - left + total_shift  # changed point
        total_shift += right - left
    return a + diff  # changed point

原因可能是转换方法的输入"a"不是整个轴,而是长度为1的numpy.array.

The reason maybe that the input 'a' for the transformation method is not the whole axis, it is only a numpy.array with length 1.

这篇关于matplotlib:绘制时间序列,同时跳过没有数据的时间段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆