用插值填充多索引Pandas DataFrame [英] Fill multi-index Pandas DataFrame with interpolation

查看:172
本文介绍了用插值填充多索引Pandas DataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用interpolate方法bfillffill一个包含NaN的多索引DataFrame(在本例中为ImpVol字段). DataFrame的一部分可能看起来像这样:

I would like to bfill and ffill a multi-index DataFrame containing NaNs (in this case the ImpVol field) using the interpolate method. A section of the DataFrame might look like this:

Expiration  OptionType  Strike    ImpVol
2014-12-26  call        140.0          NaN
                        145.0          NaN
                        147.0          NaN
                        149.0          NaN
                        150.0          NaN
                        152.5          NaN
                        155.0     0.233631
                        157.5     0.206149
                        160.0     0.149118
                        162.5     0.110867
                        165.0     0.110047
                        167.5          NaN
                        170.0          NaN
                        172.5          NaN
                        175.0          NaN
                        177.5          NaN
                        180.0          NaN
                        187.5          NaN
                        192.5          NaN
            put         132.0          NaN
                        135.0          NaN
                        140.0          NaN
                        141.0          NaN
                        142.0     0.541311
                        143.0          NaN
                        144.0     0.546672
                        145.0     0.504691
                        146.0     0.485586
                        147.0     0.426898
                        148.0     0.418084
                        149.0     0.405254
                        150.0     0.372353
                        152.5     0.311049
                        155.0     0.246892
                        157.5     0.187426
                        160.0     0.132475
                        162.5     0.098377
                        165.0          NaN
                        167.5     0.249519
                        170.0     0.270546
                        180.0          NaN
                        182.5     0.634539
                        185.0     0.656332
                        187.5     0.711593
2015-01-02  call        145.0          NaN
                        146.0          NaN
                        149.0          NaN
                        150.0          NaN
                        152.5          NaN
                        155.0     0.213742
                        157.5     0.205705
                        160.0     0.160824
                        162.5     0.143180
                        165.0     0.129292
                        167.5     0.127415
                        170.0     0.148275
                        172.5          NaN
                        175.0          NaN
                        180.0          NaN
                        182.5          NaN
                        195.0          NaN
            put         135.0     0.493639
                        140.0     0.463828
                        141.0     0.459619
                        142.0     0.442729
                        143.0     0.431823
                        145.0     0.391141
                        147.0     0.313090
                        148.0     0.310796
                        149.0     0.296146
                        150.0     0.280965
                        152.5     0.240727
                        155.0     0.203776
                        157.5     0.175431
                        160.0     0.143198
                        162.5     0.121621
                        165.0     0.105060
                        167.5     0.160085
                        170.0          NaN

对于那些不熟悉域的人,我正在插值丢失(或坏)的隐含期权波动率.这些需要通过到期和期权类型组合在行权价内进行插值,而不能在整个期权群内进行插值.例如,与2014-12-26 put选项相比,我必须分别对2014-12-26 call选项进行插值.

For those of you not familiar with the domain, I'm interpolating missing (or bad) implied option volatilities. These need to be interpolated across strike by expiration and option type combination and cannot be interpolated across the entire population of options. For example, I have to interpolate across the 2014-12-26 call options separately than the 2014-12-26 put options.

我以前选择的是这些值的一部分,以此类插值:

I was previously selecting a slice of the values to interpolate with something like this:

optype = 'call'
expiry = '2014-12-26'

s = df['ImpVol'][expiry][optype].interpolate().ffill().bfill()

,但是框架可能会很大,我想避免不得不遍历每个索引.如果我使用interpolate方法填充而不选择切片(即在整个帧中),则interpolate将在所有子索引中进行插值,这是我不想要的.例如:

but the frame can be quite large and I'd like to avoid having to loop through each of the indexes. If I use the interpolate method to fill without selecting a slice (i.e. across the entire frame), interpolate will interpolate across all of the sub indexes which is what I do not want. For example:

print df['ImpVol'].interpolate().ffill().bfill()

Expiration  OptionType  Strike    ImpVol
2014-12-26  call        140.0     0.233631
                        145.0     0.233631
                        147.0     0.233631
                        149.0     0.233631
                        150.0     0.233631
                        152.5     0.233631
                        155.0     0.233631
                        157.5     0.206149
                        160.0     0.149118
                        162.5     0.110867
                        165.0     0.110047
                        167.5     0.143222
                        170.0     0.176396
                        172.5     0.209570
                        175.0     0.242744
                        177.5     0.275918
                        180.0     0.309092
                        187.5     0.342267
                        192.5     0.375441 <-- interpolates from the 2014-12-26 call...
            put         132.0     0.408615 <-- ... to the 2014-12-26 put, which is bad
                        135.0     0.441789
                        140.0     0.474963
                        141.0     0.508137
                        142.0     0.541311
                        143.0     0.543992
                        144.0     0.546672
                        145.0     0.504691
                        146.0     0.485586
                        147.0     0.426898
                        148.0     0.418084
                        149.0     0.405254
                        150.0     0.372353
                        152.5     0.311049
                        155.0     0.246892
                        157.5     0.187426
                        160.0     0.132475
                        162.5     0.098377
                        165.0     0.173948
                        167.5     0.249519
                        170.0     0.270546
                        180.0     0.452542
                        182.5     0.634539
                        185.0     0.656332
                        187.5     0.711593

然后的问题是,如何基于索引填充多索引数据框的每个子节?

The question is then, how can I fill each subsection of the multi index data frame based on the indexes?

推荐答案

我将尝试在索引的OptionType级别上拆栈数据框.

I'd try to unstack the data frame at the OptionType level of index.

df.unstack(level=1)

这样,您应该获得一个索引数据帧,它将调用和放置类别都移动到列中.也许这不是解决问题的最优雅的方法,但是它应该可以解决问题,而不会让put/call罢工重叠.

This way you should obtain a single index dataframe which will have both call and put categories moved to columns. Maybe it's not the most elegant way of solving the problem, but it should work things out, not letting the put/call strikes to overlap.

如果多索引df是进行进一步计算的最理想方法,则可以使用堆栈方法恢复原始格式.

If multi index df is the most desirable one for further computations, you can restore the original format using stack method.

这篇关于用插值填充多索引Pandas DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆