内插PANDAS df [英] Interpolate PANDAS df
问题描述
我知道这个问题在堆栈溢出时被提起过几次,但是我仍然在插值问题上遇到绊脚石.
I know this subject was brought up a few times on stack overflow, however I'm still stumbling upon an interpolation problem.
我有一组包含一组列的复杂数据框,如果简化的话,它看起来可能像这样:
I have a complex dataframe of a set of columns, which could look something like this if simplified:
df_new = pd.DataFrame(np.random.randn(5,7), columns=[402.3, 407.2, 412.3, 415.8, 419.9, 423.5, 428.3])
wl = np.array([400.0, 408.2, 412.5, 417.2, 420.5, 423.3, 425.0])
因此,我需要对每一行逐列进行插值,将其插入新分配的cols(wl)值.
So what I need to do is to interpolate column-wise, to the new assigned values of cols (wl), for each row.
以及如何获取仅包含wl数组中显示的值的列的新数据框?
And how to get the new dataframe with columns ONLY containing values presented in the wl array?
推荐答案
使用reindex
将wl
包括在内作为新列(其值将用NaN填充).
然后使用interpolate(axis=1)
在各列之间进行插值.
严格来说,插值只能在已知值之间进行.
但是,您可以使用limit_direction='both'
沿向前和向后方向填充 NaN边缘值:
Use reindex
to include wl
as new columns (whose values will be filled with NaNs).
Then use interpolate(axis=1)
to interpolate across the columns.
Strictly speaking interpolation is only done between known values.
You could, however, use limit_direction='both'
to fill NaN edge values in both the forward and backward directions:
>>> df_new.reindex(columns=df_new.columns.union(wl)).interpolate(axis=1, limit_direction='both')
400.0 402.3 407.2 408.2 412.3 412.5 415.8 417.2 419.9 420.5 423.3 423.5 425.0 428.3
0 0.342346 0.342346 1.502418 1.102496 0.702573 0.379089 0.055606 -0.135563 -0.326732 -0.022298 0.282135 0.586569 0.164917 -0.256734
1 -0.220773 -0.220773 -0.567199 -0.789194 -1.011190 -0.485832 0.039526 -0.426771 -0.893069 -0.191818 0.509432 1.210683 0.414023 -0.382636
2 0.078147 0.078147 0.335040 -0.146892 -0.628824 -0.280976 0.066873 -0.881153 -1.829178 -0.960608 -0.092038 0.776532 0.458758 0.140985
3 -0.792214 -0.792214 0.254805 0.027573 -0.199659 -1.173250 -2.146841 -1.421482 -0.696124 -0.073018 0.550088 1.173194 -0.049967 -1.273128
4 -0.485818 -0.485818 0.019046 -1.421351 -2.861747 -1.020571 0.820605 0.097722 -0.625160 -0.782700 -0.940241 -1.097781 -0.809617 -0.521453
请注意,Pandas DataFrames将值存储在主要是基于列的数据结构中.因此,当按列而不是按行执行时,计算通常更有效.因此,最好转置您的数据框:
Note that Pandas DataFrames store values in a primarily column-based data structure. So computations are generally more efficient when done column-wise, not row-wise. Therefore, it might be better to transpose your dataframe:
df = df_new.T
,然后按照上述类似的步骤进行操作:
and then proceed similarly as described above:
df = df.reindex(index=df.index.union(wl))
df = df.interpolate(limit_direction='both')
If you want to extrapolate edge values, you could use scipy.interpolate.interp1d
with :
fill_value='extrapolate'
:
import numpy as np
import pandas as pd
import scipy.interpolate as interpolate
np.random.seed(2018)
df_new = pd.DataFrame(np.random.randn(5,7), columns=[402.3, 407.2, 412.3, 415.8, 419.9, 423.5, 428.3])
wl = np.array([400.0, 408.2, 412.5, 417.2, 420.5, 423.3, 425.0, 500])
x = df_new.columns
y = df_new.values
newx = x.union(wl)
result = pd.DataFrame(
interpolate.interp1d(x, y, fill_value='extrapolate')(newx),
columns=newx)
收益
400.0 402.3 407.2 408.2 412.3 412.5 415.8 417.2 419.9 420.5 423.3 423.5 425.0 428.3 500.0
0 -0.679793 -0.276768 0.581851 0.889017 2.148399 1.952520 -1.279487 -0.671080 0.502277 0.561236 0.836376 0.856029 0.543898 -0.142790 -15.062654
1 0.484717 0.110079 -0.688065 -0.468138 0.433564 0.437944 0.510221 0.279613 -0.165131 -0.362906 -1.285854 -1.351779 -0.758526 0.546631 28.904127
2 1.303039 1.230655 1.076446 0.628001 -1.210625 -1.158971 -0.306677 -0.563028 -1.057419 -0.814173 0.320975 0.402057 0.366778 0.289165 -1.397156
3 2.385057 1.282733 -1.065696 -1.191370 -1.706633 -1.618985 -0.172797 -0.092039 0.063710 0.114863 0.353577 0.370628 -0.246613 -1.604543 -31.108665
4 -3.360837 -2.165729 0.380370 0.251572 -0.276501 -0.293597 -0.575682 -0.235060 0.421854 0.469009 0.689062 0.704780 0.498724 0.045401 -9.804075
如果您希望创建一个仅包含wl
列的DataFrame,则可以使用result[wl]
子选择这些列,也可以仅对wl
值进行插值:
If you wish to create a DataFrame containing only the wl
columns, you could sub-select those columns using result[wl]
, or you could simplying interpolate only at the wl
values:
result_wl = pd.DataFrame(
interpolate.interp1d(x, y, fill_value='extrapolate')(wl),
columns=wl)
这篇关于内插PANDAS df的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!