如何在 matplotlib 中绘制和使用 NaN 值 [英] How to plot and work with NaN values in matplotlib

查看:56
本文介绍了如何在 matplotlib 中绘制和使用 NaN 值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有由许多列组成的每小时数据.第一列是日期( date_log ),其余列包含不同的采样点.麻烦的是,即使是每小时也要使用不同的时间来记录采样点,因此每列至少要有两个 NaN .如果我使用第一个代码绘制它效果很好,但我希望在一天左右没有记录器数据的地方有间隙,并且不希望连接点.如果我使用第二个代码,我可以看到差距,但由于 NaN 点,数据点没有加入.在下面的示例中,我只是在绘制前三列.

I have hourly data consisting of a number of columns. First column is a date (date_log), and the rest of columns contain different sample points. The trouble is sample points are logged using different time even on hourly basis, so every column has at least a couple of NaN. If I plot up using the first code it works nicely, but I want to have gaps where there no logger data for a day or so and do not want the points to be joined. If I use the second code I can see the gaps but due to NaN points the data points are not getting joined. In the example below, I’m just plotting the first three columns.

当有一个很大的差距时,比如蓝点 (01/06-01/07/2015) 我想有一个差距然后点加入.第二个例子没有连接点.我喜欢第一个图表,但是我想像第二种方法那样在没有24h日期范围等样本数据点的情况下创建间隙.将丢失的数据点长时间作为间隙.

When there is a big gap like the blue points (01/06-01/07/2015) I want to have a gap then the points getting joined. The second example does not join the points. I like the first chart but I want to create gaps like the second method when there are no sample data points for 24h date range etc. leaving missing data points for longer times as a gap.

附近有什么解决方法吗?谢谢

Is there any work around? Thanks

方法一:

Log_1a_mask = np.isfinite(Log_1a) # Log_1a is column 2 data points
Log_1b_mask = np.isfinite(Log_1b) # Log_1b is column 3 data points

plt.plot_date(date_log[Log_1a_mask], Log_1a[Log_1a_mask], linestyle='-', marker='',color='r',)
plt.plot_date(date_log[Log_1b_mask], Log_1b[Log_1b_mask], linestyle='-', marker='', color='b')
plt.show()

方法2 :

plt.plot_date(date_log, Log_1a, ‘-r*’, markersize=2, markeredgewidth=0, color=’r’) # Log_1a contains raw data with NaN
plt.plot_date(date_log, Log_1b, ‘-r*’, markersize=2, markeredgewidth=0, color=’r’) # Log_1a contains raw data with NaN
plt.show()

方法 1 输出:

方法2输出:

推荐答案

如果我没理解错的话,您的数据集有很多想要填充的小间隙(单个 NaN )和更大的差距,你没有.

If I'm understanding you correctly, you have a dataset with lots of small gaps (single NaNs) that you want filled and larger gaps that you don't.

一个选择是使用 pandas fillna 且填充值数量有限.

One option is to use pandas fillna with a limited amount of fill values.

举一个简单的例子来说明它是如何工作的:

As a quick example of how this works:

In [1]: import pandas as pd; import numpy as np

In [2]: x = pd.Series([1, np.nan, 2, np.nan, np.nan, 3, np.nan, np.nan, np.nan, 4])

In [3]: x.fillna(method='ffill', limit=1)
Out[3]:
0     1
1     1
2     2
3     2
4   NaN
5     3
6     3
7   NaN
8   NaN
9     4
dtype: float64

In [4]: x.fillna(method='ffill', limit=2)
Out[4]:
0     1
1     1
2     2
3     2
4     2
5     3
6     3
7     3
8   NaN
9     4
dtype: float64

作为将其用于类似于您的情况的示例:

As an example of using this for something similar to your case:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(1977)

x = np.random.normal(0, 1, 1000).cumsum()

# Set every third value to NaN
x[::3] = np.nan

# Set a few bigger gaps...
x[20:100], x[200:300], x[400:450] = np.nan, np.nan, np.nan

# Use pandas with a limited forward fill
# You may want to adjust the `limit` here. This will fill 2 nan gaps.
filled = pd.Series(x).fillna(limit=2, method='ffill')

# Let's plot the results
fig, axes = plt.subplots(nrows=2, sharex=True)
axes[0].plot(x, color='lightblue')
axes[1].plot(filled, color='lightblue')

axes[0].set(ylabel='Original Data')
axes[1].set(ylabel='Filled Data')

plt.show()

或者,我们可以仅使用 numpy 进行此操作.可以(并且更有效)执行与上述 Pandas 方法相同的前向填充",但我将展示另一种方法,为您提供更多选项,而不仅仅是重复值.

Alternatively, we can do this using only numpy. It's possible (and more efficient) to do a "forward fill" identical to the pandas method above, but I'll show another method to give you more options than just repeating values.

我们可以重复对间隙中的值进行线性插值,而不是通过间隙"重复最后一个值.这在计算上效率较低(我将通过在所有地方进行插值来使其效率更低),但对于大多数数据集,您不会注意到重大差异.

Instead of repeating the last value through the "gap", we can perform linear interpolation of the values in the gap. This is less efficient computationally (and I'm going to make it even less efficient by interpolating everywhere), but for most datasets you won't notice a major difference.

举个例子,让我们定义一个 interpolate_gaps 函数:

As an example, let's define an interpolate_gaps function:

def interpolate_gaps(values, limit=None):
    """
    Fill gaps using linear interpolation, optionally only fill gaps up to a
    size of `limit`.
    """
    values = np.asarray(values)
    i = np.arange(values.size)
    valid = np.isfinite(values)
    filled = np.interp(i, i[valid], values[valid])

    if limit is not None:
        invalid = ~valid
        for n in range(1, limit+1):
            invalid[:-n] &= invalid[n:]
        filled[invalid] = np.nan

    return filled

请注意,与之前的 pandas 版本不同,我们将获得内插值:

Note that we'll get interpolated value, unlike the previous pandas version:

In [11]: values = [1, np.nan, 2, np.nan, np.nan, 3, np.nan, np.nan, np.nan, 4]

In [12]: interpolate_gaps(values, limit=1)
Out[12]:
array([ 1.        ,  1.5       ,  2.        ,         nan,  2.66666667,
        3.        ,         nan,         nan,  3.75      ,  4.        ])

在绘图示例中,如果我们替换行:

In the plotting example, if we replace the line:

filled = pd.Series(x).fillna(limit=2, method='ffill')

与:

filled = interpolate_gaps(x, limit=2)

我们将获得一个视觉上相同的情节:

We'll get a visually identical plot:

作为一个完整的独立示例:

As a complete, stand-alone example:

import numpy as np
import matplotlib.pyplot as plt
np.random.seed(1977)

def interpolate_gaps(values, limit=None):
    """
    Fill gaps using linear interpolation, optionally only fill gaps up to a
    size of `limit`.
    """
    values = np.asarray(values)
    i = np.arange(values.size)
    valid = np.isfinite(values)
    filled = np.interp(i, i[valid], values[valid])

    if limit is not None:
        invalid = ~valid
        for n in range(1, limit+1):
            invalid[:-n] &= invalid[n:]
        filled[invalid] = np.nan

    return filled

x = np.random.normal(0, 1, 1000).cumsum()

# Set every third value to NaN
x[::3] = np.nan

# Set a few bigger gaps...
x[20:100], x[200:300], x[400:450] = np.nan, np.nan, np.nan

# Interpolate small gaps using numpy
filled = interpolate_gaps(x, limit=2)

# Let's plot the results
fig, axes = plt.subplots(nrows=2, sharex=True)
axes[0].plot(x, color='lightblue')
axes[1].plot(filled, color='lightblue')

axes[0].set(ylabel='Original Data')
axes[1].set(ylabel='Filled Data')

plt.show()

注意:我最初完全误读了这个问题.查看版本历史记录以获取原始答案.

Note: I originally completely mis-read the question. See version history for my original answer.

这篇关于如何在 matplotlib 中绘制和使用 NaN 值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆