内插PANDAS df [英] Interpolate PANDAS df

查看:116
本文介绍了内插PANDAS df的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道这个问题在堆栈溢出时被提起过几次,但是我仍然在插值问题上遇到绊脚石.

I know this subject was brought up a few times on stack overflow, however I'm still stumbling upon an interpolation problem.

我有一组包含一组列的复杂数据框,如果简化的话,它看起来可能像这样:

I have a complex dataframe of a set of columns, which could look something like this if simplified:

df_new = pd.DataFrame(np.random.randn(5,7), columns=[402.3, 407.2, 412.3, 415.8, 419.9, 423.5, 428.3])
wl     = np.array([400.0, 408.2, 412.5, 417.2, 420.5, 423.3, 425.0])

因此,我需要对每一行逐列进行插值,将其插入新分配的cols(wl)值.

So what I need to do is to interpolate column-wise, to the new assigned values of cols (wl), for each row.

以及如何获取仅包含wl数组中显示的值的列的新数据框?

And how to get the new dataframe with columns ONLY containing values presented in the wl array?

推荐答案

使用reindexwl包括在内作为新列(其值将用NaN填充). 然后使用interpolate(axis=1)在各列之间进行插值. 严格来说,插值只能在已知值之间进行. 但是,您可以使用limit_direction='both'沿向前和向后方向填充 NaN边缘值:

Use reindex to include wl as new columns (whose values will be filled with NaNs). Then use interpolate(axis=1) to interpolate across the columns. Strictly speaking interpolation is only done between known values. You could, however, use limit_direction='both' to fill NaN edge values in both the forward and backward directions:

>>> df_new.reindex(columns=df_new.columns.union(wl)).interpolate(axis=1, limit_direction='both')
      400.0     402.3     407.2     408.2     412.3     412.5     415.8     417.2     419.9     420.5     423.3     423.5     425.0     428.3
0  0.342346  0.342346  1.502418  1.102496  0.702573  0.379089  0.055606 -0.135563 -0.326732 -0.022298  0.282135  0.586569  0.164917 -0.256734
1 -0.220773 -0.220773 -0.567199 -0.789194 -1.011190 -0.485832  0.039526 -0.426771 -0.893069 -0.191818  0.509432  1.210683  0.414023 -0.382636
2  0.078147  0.078147  0.335040 -0.146892 -0.628824 -0.280976  0.066873 -0.881153 -1.829178 -0.960608 -0.092038  0.776532  0.458758  0.140985
3 -0.792214 -0.792214  0.254805  0.027573 -0.199659 -1.173250 -2.146841 -1.421482 -0.696124 -0.073018  0.550088  1.173194 -0.049967 -1.273128
4 -0.485818 -0.485818  0.019046 -1.421351 -2.861747 -1.020571  0.820605  0.097722 -0.625160 -0.782700 -0.940241 -1.097781 -0.809617 -0.521453

请注意,Pandas DataFrames将值存储在主要是基于列的数据结构中.因此,当按列而不是按行执行时,计算通常更有效.因此,最好转置您的数据框:

Note that Pandas DataFrames store values in a primarily column-based data structure. So computations are generally more efficient when done column-wise, not row-wise. Therefore, it might be better to transpose your dataframe:

df = df_new.T

,然后按照上述类似的步骤进行操作:

and then proceed similarly as described above:

df = df.reindex(index=df.index.union(wl))
df = df.interpolate(limit_direction='both')


如果要外推边值,则可以使用


If you want to extrapolate edge values, you could use scipy.interpolate.interp1d with : fill_value='extrapolate':

import numpy as np
import pandas as pd
import scipy.interpolate as interpolate
np.random.seed(2018)

df_new = pd.DataFrame(np.random.randn(5,7), columns=[402.3, 407.2, 412.3, 415.8, 419.9, 423.5, 428.3])
wl = np.array([400.0, 408.2, 412.5, 417.2, 420.5, 423.3, 425.0, 500])

x = df_new.columns
y = df_new.values
newx = x.union(wl)
result = pd.DataFrame(
    interpolate.interp1d(x, y, fill_value='extrapolate')(newx),
    columns=newx)

收益

      400.0     402.3     407.2     408.2     412.3     412.5     415.8     417.2     419.9     420.5     423.3     423.5     425.0     428.3      500.0
0 -0.679793 -0.276768  0.581851  0.889017  2.148399  1.952520 -1.279487 -0.671080  0.502277  0.561236  0.836376  0.856029  0.543898 -0.142790 -15.062654
1  0.484717  0.110079 -0.688065 -0.468138  0.433564  0.437944  0.510221  0.279613 -0.165131 -0.362906 -1.285854 -1.351779 -0.758526  0.546631  28.904127
2  1.303039  1.230655  1.076446  0.628001 -1.210625 -1.158971 -0.306677 -0.563028 -1.057419 -0.814173  0.320975  0.402057  0.366778  0.289165  -1.397156
3  2.385057  1.282733 -1.065696 -1.191370 -1.706633 -1.618985 -0.172797 -0.092039  0.063710  0.114863  0.353577  0.370628 -0.246613 -1.604543 -31.108665
4 -3.360837 -2.165729  0.380370  0.251572 -0.276501 -0.293597 -0.575682 -0.235060  0.421854  0.469009  0.689062  0.704780  0.498724  0.045401  -9.804075


如果您希望创建一个仅包含wl列的DataFrame,则可以使用result[wl]子选择这些列,也可以仅对wl值进行插值:


If you wish to create a DataFrame containing only the wl columns, you could sub-select those columns using result[wl], or you could simplying interpolate only at the wl values:

result_wl = pd.DataFrame(
    interpolate.interp1d(x, y, fill_value='extrapolate')(wl),
    columns=wl)

这篇关于内插PANDAS df的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆