即使大多数数据已填充,也无法插值数据帧 [英] Cannot interpolate dataframe even if most of the data is filled

查看:583
本文介绍了即使大多数数据已填充,也无法插值数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试使用interpolate()方法在我的DataFrame中内插NaN.但是,该方法失败,并显示错误:

I tried to interpolate the NaN in my DataFrame using interpolate() method. However, the method failed with error :

无法内插所有NaN.

Cannot interpolate with all NaNs.

代码如下:

try:
    df3.interpolate(method='index', inplace=True)
    processor._arma(df3['TCA'])
except Exception, e:
    sys.stderr.write('%s: [%s] %s\n' % (time.strftime("%Y-%m-%d %H:%M:%S"), nid3, e))
    sys.stderr.write('%s: [%s] len=%d\n' % (time.strftime("%Y-%m-%d %H:%M:%S"), nid3, len(df3.index)))
    sys.stderr.write('%s: [%s] %s\n' % (time.strftime("%Y-%m-%d %H:%M:%S"), nid3, df3.to_string()))

这很奇怪,因为大多数数据已经填充,如您在日志1 日志2 中看到的那样.数据帧的长度为20,如下所示的所有数据.即使每个单元格都已填充,我仍然无法使用插值方法.顺便说一句,df3是一个全局值,我不确定是否会出现问题.

This is strange, because most of the data is already filled, as you can see in log 1 or log 2. The length of the dataframe is 20, as all the data shown below. Even each cell is filled, I still can't use interpolate method. BTW, df3 is a global value, I'm not sure if it would be a problem.

日志1

2016-01-21 22:06:11: [ESIG_node_003_400585511] Cannot interpolate with all NaNs.
2016-01-21 22:06:11: [ESIG_node_003_400585511] len=20
2016-01-21 22:06:11: [ESIG_node_003_400585511]
                     TCA TCB TCC
2016-01-21 20:06:22  19  17  18
2016-01-21 20:06:23  19  17  18
2016-01-21 20:06:24  18  18  18
2016-01-21 20:06:25  18  17  18
2016-01-21 20:06:26  18  18  18
2016-01-21 20:06:27  19  18  18
2016-01-21 20:06:28  19  17  18
2016-01-21 20:06:29  18  18  18
2016-01-21 20:06:30  18  17  18
2016-01-21 20:06:31  19  17  18
2016-01-21 20:06:32  18  17  18
2016-01-21 20:06:33  18  18  18
2016-01-21 20:06:34  19  18  18
2016-01-21 20:06:35  18  17  18
2016-01-21 20:06:36  19  18  18
2016-01-21 20:06:37  18  18  18
2016-01-21 20:06:38  18  18  18
2016-01-21 20:06:39  19  18  18
2016-01-21 20:06:40  18  17  18
2016-01-21 20:06:41  18  18  18

日志2

2016-01-21 22:06:14: [ESIG_node_003_400585511] Cannot interpolate with all NaNs.
2016-01-21 22:06:14: [ESIG_node_003_400585511] len=20
2016-01-21 22:06:14: [ESIG_node_003_400585511]
                      TCA  TCB  TCC
2016-01-21 20:06:33   18   18   18
2016-01-21 20:06:34   19   18   18
2016-01-21 20:06:35   18   17   18
2016-01-21 20:06:36   19   18   18
2016-01-21 20:06:37   18   18   18
2016-01-21 20:06:38   18   18   18
2016-01-21 20:06:39   19   18   18
2016-01-21 20:06:40   18   17   18
2016-01-21 20:06:41   18   18   18
2016-01-21 20:06:42  NaN  NaN  NaN
2016-01-21 20:06:43  NaN  NaN  NaN
2016-01-21 20:06:44  NaN  NaN  NaN
2016-01-21 20:06:45  NaN  NaN  NaN
2016-01-21 20:06:46   19   18   18
2016-01-21 20:06:47   18   17   18
2016-01-21 20:06:48   18   18   18
2016-01-21 20:06:49   19   18   18
2016-01-21 20:06:50   18   17   18
2016-01-21 20:06:51   18   18   18
2016-01-21 20:06:52   19   17   18

推荐答案

检查您的DataFrame是否具有数字dtypes ,而不是object dtypes.这 如果DataFrame可能发生TypeError: Cannot interpolate with all NaNs 包含object dtype的列.例如,如果

Check that your DataFrame has numeric dtypes, not object dtypes. The TypeError: Cannot interpolate with all NaNs can occur if the DataFrame contains columns of object dtype. For example, if

import numpy as np
import pandas as pd

df = pd.DataFrame({'A':np.array([1,np.nan,30], dtype='O')}, 
                  index=['2016-01-21 20:06:22', '2016-01-21 20:06:23', 
                         '2016-01-21 20:06:24'])

然后df.interpolate()引发TypeError.

then df.interpolate() raises the TypeError.

要检查您的DataFrame是否具有对象dtype的列,请查看df3.dtypes:

To check if your DataFrame has columns with object dtype, look at df3.dtypes:

In [92]: df.dtypes
Out[92]: 
A    object
dtype: object

要解决此问题,您需要确保DataFrame的数字列带有 本地NumPy dtypes.显然,最好是构建DataFrame 从一开始就正确.因此,最好的解决方案取决于您的状态 构建数据框架.

To fix the problem, you need to ensure the DataFrame has numeric columns with native NumPy dtypes. Obviously, it would be best to build the DataFrame correctly from the very beginning. So the best solution depends on how you are building the DataFrame.

一个不太吸引人的修补程序是使用pd.to_numeric事后将对象数组转换为数字数组:

A less appealing patch-up fix would be to use pd.to_numeric to convert the object arrays to numeric arrays after-the-fact:

for col in df:
    df[col] = pd.to_numeric(df[col], errors='coerce')

使用errors='coerce',所有无法转换为数字的值都将转换为NaN.在每一列上调用pd.to_numeric后,请注意dtype现在为float64:

With errors='coerce', any value that could not be converted to a number is converted to NaN. After calling pd.to_numeric on each column, notice that the dtype is now float64:

In [94]: df.dtypes
Out[94]: 
A    float64
dtype: object

一旦DataFrame具有数字dtype,并且DataFrame具有DatetimeIndex,则df.interpolate(method='time')将起作用:

Once the DataFrame has numeric dtypes, and the DataFrame has a DatetimeIndex, then df.interpolate(method='time') will work:

import numpy as np
import pandas as pd

df = pd.DataFrame({'A':np.array([1,np.nan,30], dtype='O')}, 
                  index=['2016-01-21 20:06:22', '2016-01-21 20:06:23', 
                         '2016-01-21 20:06:24'])

for col in df:
    df[col] = pd.to_numeric(df[col], errors='coerce')
df.index = pd.DatetimeIndex(df.index)
df = df.interpolate(method='time')
print(df)

收益

                        A
2016-01-21 20:06:22   1.0
2016-01-21 20:06:23  15.5
2016-01-21 20:06:24  30.0

这篇关于即使大多数数据已填充,也无法插值数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆