pandas 重新采样插值产生NaNs [英] pandas resample interpolate is producing NaNs

查看:146
本文介绍了 pandas 重新采样插值产生NaNs的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此示例修改而成:

import io
import pandas as pd
import matplotlib.pyplot as plt

data = io.StringIO('''\
Values
1992-08-27 07:46:48,1
1992-08-27 08:00:48,2
1992-08-27 08:33:48,4
1992-08-27 08:43:48,3
1992-08-27 08:48:48,1
1992-08-27 08:51:48,5
1992-08-27 08:53:48,4
1992-08-27 08:56:48,2
1992-08-27 09:03:48,1
''')
s = pd.read_csv(data, squeeze=True)
s.index = pd.to_datetime(s.index)

res = s.resample('4s').interpolate('linear')
print(res)
plt.plot(res, '.-')
plt.plot(s, 'o')
plt.grid(True)

它按预期工作:

1992-08-27 07:46:48    1.000000
1992-08-27 07:46:52    1.004762
1992-08-27 07:46:56    1.009524
1992-08-27 07:47:00    1.014286
1992-08-27 07:47:04    1.019048
1992-08-27 07:47:08    1.023810
1992-08-27 07:47:12    1.028571
....

但是如果我将重采样更改为'5s',它只会产生NaN:

but if I change the resample to '5s', it produces only NaNs:

1992-08-27 07:46:45   NaN
1992-08-27 07:46:50   NaN
1992-08-27 07:46:55   NaN
1992-08-27 07:47:00   NaN
1992-08-27 07:47:05   NaN
1992-08-27 07:47:10   NaN
1992-08-27 07:47:15   NaN
....

为什么?

推荐答案

选项1
这是因为'4s'与您现有的索引完全一致.当您resample时,您可以从旧系列中得到表示并能够进行插值.您要做的是创建一个索引,该索引是旧索引与新索引的并集.然后插值并用新索引重新索引.

Option 1
That's because '4s' aligns perfectly with your existing index. When you resample, you get representation from your old series and are able to interpolate. What you want to do is to create an index that is the union of the old index with a new index. Then interpolate and reindex with a new index.

oidx = s.index
nidx = pd.date_range(oidx.min(), oidx.max(), freq='5s')
res = s.reindex(oidx.union(nidx)).interpolate('index').reindex(nidx)
res.plot(style='.-')
s.plot(style='o')

选项2A
如果您愿意放弃准确性,则可以将ffill限制为1

Option 2A
If you are willing to forgo accuracy, you can ffill with a limit of 1

res = s.resample('5s').ffill(limit=1).interpolate()
res.plot(style='.-')
s.plot(style='o')

选项2B
bfill

res = s.resample('5s').bfill(limit=1).interpolate()
res.plot(style='.-')
s.plot(style='o')

选项3
中级复杂性和准确性

Option 3
Intermediate complexity and accuracy

nidx = pd.date_range(oidx.min(), oidx.max(), freq='5s')
res = s.reindex(nidx, method='nearest', limit=1).interpolate()
res.plot(style='.-')
s.plot(style='o')

这篇关于 pandas 重新采样插值产生NaNs的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆