用插值填充多索引Pandas DataFrame [英] Fill multi-index Pandas DataFrame with interpolation
问题描述
我想使用interpolate
方法bfill
和ffill
一个包含NaN
的多索引DataFrame
(在本例中为ImpVol
字段). DataFrame
的一部分可能看起来像这样:
I would like to bfill
and ffill
a multi-index DataFrame
containing NaN
s (in this case the ImpVol
field) using the interpolate
method. A section of the DataFrame
might look like this:
Expiration OptionType Strike ImpVol
2014-12-26 call 140.0 NaN
145.0 NaN
147.0 NaN
149.0 NaN
150.0 NaN
152.5 NaN
155.0 0.233631
157.5 0.206149
160.0 0.149118
162.5 0.110867
165.0 0.110047
167.5 NaN
170.0 NaN
172.5 NaN
175.0 NaN
177.5 NaN
180.0 NaN
187.5 NaN
192.5 NaN
put 132.0 NaN
135.0 NaN
140.0 NaN
141.0 NaN
142.0 0.541311
143.0 NaN
144.0 0.546672
145.0 0.504691
146.0 0.485586
147.0 0.426898
148.0 0.418084
149.0 0.405254
150.0 0.372353
152.5 0.311049
155.0 0.246892
157.5 0.187426
160.0 0.132475
162.5 0.098377
165.0 NaN
167.5 0.249519
170.0 0.270546
180.0 NaN
182.5 0.634539
185.0 0.656332
187.5 0.711593
2015-01-02 call 145.0 NaN
146.0 NaN
149.0 NaN
150.0 NaN
152.5 NaN
155.0 0.213742
157.5 0.205705
160.0 0.160824
162.5 0.143180
165.0 0.129292
167.5 0.127415
170.0 0.148275
172.5 NaN
175.0 NaN
180.0 NaN
182.5 NaN
195.0 NaN
put 135.0 0.493639
140.0 0.463828
141.0 0.459619
142.0 0.442729
143.0 0.431823
145.0 0.391141
147.0 0.313090
148.0 0.310796
149.0 0.296146
150.0 0.280965
152.5 0.240727
155.0 0.203776
157.5 0.175431
160.0 0.143198
162.5 0.121621
165.0 0.105060
167.5 0.160085
170.0 NaN
对于那些不熟悉域的人,我正在插值丢失(或坏)的隐含期权波动率.这些需要通过到期和期权类型组合在行权价内进行插值,而不能在整个期权群内进行插值.例如,与2014-12-26
put
选项相比,我必须分别对2014-12-26
call
选项进行插值.
For those of you not familiar with the domain, I'm interpolating missing (or bad) implied option volatilities. These need to be interpolated across strike by expiration and option type combination and cannot be interpolated across the entire population of options. For example, I have to interpolate across the 2014-12-26
call
options separately than the 2014-12-26
put
options.
我以前选择的是这些值的一部分,以此类插值:
I was previously selecting a slice of the values to interpolate with something like this:
optype = 'call'
expiry = '2014-12-26'
s = df['ImpVol'][expiry][optype].interpolate().ffill().bfill()
,但是框架可能会很大,我想避免不得不遍历每个索引.如果我使用interpolate
方法填充而不选择切片(即在整个帧中),则interpolate
将在所有子索引中进行插值,这是我不想要的.例如:
but the frame can be quite large and I'd like to avoid having to loop through each of the indexes. If I use the interpolate
method to fill without selecting a slice (i.e. across the entire frame), interpolate
will interpolate across all of the sub indexes which is what I do not want. For example:
print df['ImpVol'].interpolate().ffill().bfill()
Expiration OptionType Strike ImpVol
2014-12-26 call 140.0 0.233631
145.0 0.233631
147.0 0.233631
149.0 0.233631
150.0 0.233631
152.5 0.233631
155.0 0.233631
157.5 0.206149
160.0 0.149118
162.5 0.110867
165.0 0.110047
167.5 0.143222
170.0 0.176396
172.5 0.209570
175.0 0.242744
177.5 0.275918
180.0 0.309092
187.5 0.342267
192.5 0.375441 <-- interpolates from the 2014-12-26 call...
put 132.0 0.408615 <-- ... to the 2014-12-26 put, which is bad
135.0 0.441789
140.0 0.474963
141.0 0.508137
142.0 0.541311
143.0 0.543992
144.0 0.546672
145.0 0.504691
146.0 0.485586
147.0 0.426898
148.0 0.418084
149.0 0.405254
150.0 0.372353
152.5 0.311049
155.0 0.246892
157.5 0.187426
160.0 0.132475
162.5 0.098377
165.0 0.173948
167.5 0.249519
170.0 0.270546
180.0 0.452542
182.5 0.634539
185.0 0.656332
187.5 0.711593
然后的问题是,如何基于索引填充多索引数据框的每个子节?
The question is then, how can I fill each subsection of the multi index data frame based on the indexes?
推荐答案
我将尝试在索引的OptionType级别上拆栈数据框.
I'd try to unstack the data frame at the OptionType level of index.
df.unstack(level=1)
这样,您应该获得一个索引数据帧,它将调用和放置类别都移动到列中.也许这不是解决问题的最优雅的方法,但是它应该可以解决问题,而不会让put/call罢工重叠.
This way you should obtain a single index dataframe which will have both call and put categories moved to columns. Maybe it's not the most elegant way of solving the problem, but it should work things out, not letting the put/call strikes to overlap.
如果多索引df是进行进一步计算的最理想方法,则可以使用堆栈方法恢复原始格式.
If multi index df is the most desirable one for further computations, you can restore the original format using stack method.
这篇关于用插值填充多索引Pandas DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!