在matplotlib中使用NaN值 [英] Working with NaN values in matplotlib

查看:101
本文介绍了在matplotlib中使用NaN值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我每小时的数据由许多列组成.第一列是日期(date_log),其余列包含不同的采样点.麻烦之处在于,即使是每小时也要使用不同的时间记录采样点,因此每列至少要有两个NaN.如果我使用第一个代码进行绘图,则效果很好,但我希望有一天左右没有记录器数据的空白,并且不希望将这些点合并.如果我使用第二个代码,我可以看到差距,但是由于NaN点的缘故,数据点没有连接在一起.在下面的示例中,我只是在绘制前三列.

I have hourly data consisting of a number of columns. First column is a date (date_log), and the rest of columns contain different sample points. The trouble is sample points are logged using different time even on hourly basis, so every column has at least a couple of NaN. If I plot up using the first code it works nicely, but I want to have gaps where there no logger data for a day or so and do not want the points to be joined. If I use the second code I can see the gaps but due to NaN points the data points are not getting joined. In the example below, I’m just plotting the first three columns.

如果有很大的缺口,例如蓝点(01/06-01/07/2015),我想有一个缺口,然后将这些点加入.第二个示例不合计.我喜欢第一个图表,但是我想像第二种方法那样在没有24h日期范围等样本数据点的情况下创建间隙.将丢失的数据点长时间作为间隙.

When there is a big gap like the blue points (01/06-01/07/2015) I want to have a gap then the points getting joined. The second example does not join the points. I like the first chart but I want to create gaps like the second method when there are no sample data points for 24h date range etc. leaving missing data points for longer times as a gap.

有什么解决方法吗?谢谢

Is there any work around? Thanks

1-方法:

Log_1a_mask = np.isfinite(Log_1a) # Log_1a is column 2 data points
Log_1b_mask = np.isfinite(Log_1b) # Log_1b is column 3 data points

plt.plot_date(date_log[Log_1a_mask], Log_1a[Log_1a_mask], linestyle='-', marker='',color='r',)
plt.plot_date(date_log[Log_1b_mask], Log_1b[Log_1b_mask], linestyle='-', marker='', color='b')
plt.show()

2-方法:

plt.plot_date(date_log, Log_1a, ‘-r*’, markersize=2, markeredgewidth=0, color=’r’) # Log_1a contains raw data with NaN
plt.plot_date(date_log, Log_1b, ‘-r*’, markersize=2, markeredgewidth=0, color=’r’) # Log_1a contains raw data with NaN
plt.show()

1-方法输出:

2方法输出:

推荐答案

如果我对您的理解正确,则您的数据集包含许多小的空白(单个NaN),需要填充,而较大的空白不要.

If I'm understanding you correctly, you have a dataset with lots of small gaps (single NaNs) that you want filled and larger gaps that you don't.

一种选择是使用pandas fillna并使用有限数量的填充值.

One option is to use pandas fillna with a limited amount of fill values.

作为其工作原理的快速示例:

As a quick example of how this works:

In [1]: import pandas as pd; import numpy as np

In [2]: x = pd.Series([1, np.nan, 2, np.nan, np.nan, 3, np.nan, np.nan, np.nan, 4])

In [3]: x.fillna(method='ffill', limit=1)
Out[3]:
0     1
1     1
2     2
3     2
4   NaN
5     3
6     3
7   NaN
8   NaN
9     4
dtype: float64

In [4]: x.fillna(method='ffill', limit=2)
Out[4]:
0     1
1     1
2     2
3     2
4     2
5     3
6     3
7     3
8   NaN
9     4
dtype: float64

作为在与您的案例类似的案例中使用此代码的示例:

As an example of using this for something similar to your case:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(1977)

x = np.random.normal(0, 1, 1000).cumsum()

# Set every third value to NaN
x[::3] = np.nan

# Set a few bigger gaps...
x[20:100], x[200:300], x[400:450] = np.nan, np.nan, np.nan

# Use pandas with a limited forward fill
# You may want to adjust the `limit` here. This will fill 2 nan gaps.
filled = pd.Series(x).fillna(limit=2, method='ffill')

# Let's plot the results
fig, axes = plt.subplots(nrows=2, sharex=True)
axes[0].plot(x, color='lightblue')
axes[1].plot(filled, color='lightblue')

axes[0].set(ylabel='Original Data')
axes[1].set(ylabel='Filled Data')

plt.show()

或者,我们可以仅使用numpy来执行此操作.可以(和更有效地)执行与上述pandas方法相同的向前填充",但是我将展示另一种为您提供更多选择的方法,而不仅仅是重复值.

Alternatively, we can do this using only numpy. It's possible (and more efficient) to do a "forward fill" identical to the pandas method above, but I'll show another method to give you more options than just repeating values.

我们可以重复对间隙中的值进行线性插值,而不是通过间隙"重复最后一个值.这在计算上效率较低(并且我将通过在任何地方插值来使其效率进一步降低),但是对于大多数数据集,您不会注意到主要的差异.

Instead of repeating the last value through the "gap", we can perform linear interpolation of the values in the gap. This is less efficient computationally (and I'm going to make it even less efficient by interpolating everywhere), but for most datasets you won't notice a major difference.

作为一个例子,让我们定义一个interpolate_gaps函数:

As an example, let's define an interpolate_gaps function:

def interpolate_gaps(values, limit=None):
    """
    Fill gaps using linear interpolation, optionally only fill gaps up to a
    size of `limit`.
    """
    values = np.asarray(values)
    i = np.arange(values.size)
    valid = np.isfinite(values)
    filled = np.interp(i, i[valid], values[valid])

    if limit is not None:
        invalid = ~valid
        for n in range(1, limit+1):
            invalid[:-n] &= invalid[n:]
        filled[invalid] = np.nan

    return filled

请注意,我们将获得内插值,这与以前的pandas版本不同:

Note that we'll get interpolated value, unlike the previous pandas version:

In [11]: values = [1, np.nan, 2, np.nan, np.nan, 3, np.nan, np.nan, np.nan, 4]

In [12]: interpolate_gaps(values, limit=1)
Out[12]:
array([ 1.        ,  1.5       ,  2.        ,         nan,  2.66666667,
        3.        ,         nan,         nan,  3.75      ,  4.        ])

在绘图示例中,如果我们替换行:

In the plotting example, if we replace the line:

filled = pd.Series(x).fillna(limit=2, method='ffill')

使用:

filled = interpolate_gaps(x, limit=2)

我们将获得一个视觉上相同的情节:

We'll get a visually identical plot:

作为完整的独立示例:

import numpy as np
import matplotlib.pyplot as plt
np.random.seed(1977)

def interpolate_gaps(values, limit=None):
    """
    Fill gaps using linear interpolation, optionally only fill gaps up to a
    size of `limit`.
    """
    values = np.asarray(values)
    i = np.arange(values.size)
    valid = np.isfinite(values)
    filled = np.interp(i, i[valid], values[valid])

    if limit is not None:
        invalid = ~valid
        for n in range(1, limit+1):
            invalid[:-n] &= invalid[n:]
        filled[invalid] = np.nan

    return filled

x = np.random.normal(0, 1, 1000).cumsum()

# Set every third value to NaN
x[::3] = np.nan

# Set a few bigger gaps...
x[20:100], x[200:300], x[400:450] = np.nan, np.nan, np.nan

# Interpolate small gaps using numpy
filled = interpolate_gaps(x, limit=2)

# Let's plot the results
fig, axes = plt.subplots(nrows=2, sharex=True)
axes[0].plot(x, color='lightblue')
axes[1].plot(filled, color='lightblue')

axes[0].set(ylabel='Original Data')
axes[1].set(ylabel='Filled Data')

plt.show()

注意:我本来是完全误解了问题.查看版本历史记录以获取原始答案.

Note: I originally completely mis-read the question. See version history for my original answer.

这篇关于在matplotlib中使用NaN值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆