pandas multiindex数据框,缺少值的ND插值 [英] pandas multiindex dataframe, ND interpolation for missing values

查看:65
本文介绍了pandas multiindex数据框,缺少值的ND插值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有可能在熊猫中对多索引数据框中的缺失值进行插值.下面的示例不能按预期工作:

Is it possible in pandas to interpolate for missing values in multiindex dataframe. This example below does not work as expected:

arr1=np.array(np.arange(1.,10.,1.))
arr2=np.array(np.arange(2.,20.,2.))
df1=pd.DataFrame(zip(arr1,arr2,arr1+arr2,arr1*arr2),columns=['x','y','xplusy','xtimesy'])

df1.set_index(['x','y'],inplace=True)

df2=df1.reindex(index=zip(*df1.index.levels)+[(2,2),(3,2),(5,5)])
df2.sortlevel([0,1],inplace=True)
df2.interpolate(method='linear',inplace=True)

在xplusy和xtimesy列中未显示我期望添加的索引.

displays not what I expected in xplusy and xtimesy columns for added indices.

-----------  ----  ---
(1.0, 2.0)    3      2
(2.0, 2.0)    4.5    5
(2.0, 4.0)    6      8
(3.0, 2.0)    7.5   13
(3.0, 6.0)    9     18
(4.0, 8.0)   12     32
(5.0, 5.0)   13.5   41
(5.0, 10.0)  15     50
(6.0, 12.0)  18     72
(7.0, 14.0)  21     98
(8.0, 16.0)  24    128
(9.0, 18.0)  27    162
-----------  ----  ---

推荐答案

因此,在填充缺失值之前,这是前几行中的内容:

So before filling the missing values, this is what you have in the first few rows:

df2

      xplusy  xtimesy
x y                  
1 2        3        2
2 2      NaN      NaN
  4        6        8

您似乎想根据MultiIndex进行插值.我不相信有任何方法可以对熊猫进行插值,但是您可以基于简单的索引来做到这一点(method ='linear'会忽略索引btw,它也是默认值,因此也无需指定它):

It looks like you want to interpolate based on the MultiIndex. I don't believe there is any way to do that with pandas interpolate, but you can do it based on a simple index (method='linear' ignores the index btw and is also the default so no need to specify it either):

df2.reset_index(level=1).interpolate(method='index')

    y  xplusy  xtimesy
x                     
1   2       3        2
2   2       6        8
2   4       6        8

df2.reset_index(level=0).interpolate(method='index')

    x  xplusy  xtimesy
y                     
2   1     3.0        2
2   2     3.0        2
4   2     6.0        8

很明显,在这种情况下,您可以分多个步骤创建xplusy和xtimesy(首先是x,然后是y,然后是xplusy和xtimesy),但是我不确定这是否是您真正想要做的事情.

Obviously in this case you could create xplusy and xtimesy in multiple steps (first x, then y, then xplusy and xtimesy) but I'm not sure if that's what you are really trying to do.

无论如何,这是一种1d插值,您可以使用熊猫插值轻松完成.如果那还不够,您可以查看numpy的 interp2d 入门.

Anyway, this is the kind of 1d interpolation you can do pretty easily with pandas interpolate. If that's not enough, you could look into numpy's interp2d for starters.

这篇关于pandas multiindex数据框,缺少值的ND插值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆