基于另一个 DataFrame 在 DataFrame 上应用插值 [英] Applying interpolation on DataFrame based on another DataFrame
问题描述
我有一个 DataFrame
,我想根据特定列的值以某种方式添加新列,其结果取决于 另一个 中包含的数据 DataFrame
.
更具体地说,我有
df_original =Crncy价差持续时间0 欧元 100 1.21南南南2 100 3.463 瑞士法郎 200 2.54 美元 50 5.0...df_interpolation =CRNCY TENOR Adj_EUR Adj_USD0 欧元 1 10 201 欧元 2 20 302 欧元 5 30 403 欧元 7 40 50...10 瑞士法郎 1 50 1011 瑞士法郎 2 60 2012 瑞士法郎 5 70 30...
现在想根据 Crncy 的值,为每一行添加列
Adj_EUR
和 Adj_USD
到 df_original
和 Duration
使用标准线性插值.
因此,我们希望使用 df_interpolation
和 Duration<中的
TENOR
和 Adj_USD
/Adj_EUR
/code> 来自 df_original
,对于每个可用的 Crncy
,形成插值.
例如使用来自 scipy
的 optimize
-package 的伪代码:
from scipy import 优化""" 对 'Adj_EUR' 和 'Adj_USD' 执行此操作 """# 对于 'Adj_EUR'对于 curr, df_original.groupby('Crncy') 中的 df:x_data = df_interpolation[df_interpolation['CRNCY']==curr].as_matrix(['TENOR'])y_data = df_interpolation[df_interpolation['CRNCY']==curr].as_matrix(['Adj_EUR'])"""线性拟合"""z_linear = optimize.curve_fit(lambda t,a,b: a + b * t, x_data.ravel(), y_data.ravel())[0]""" 以某种方式将值添加回新列 """ 中的 df_originaldf['Adj_EUR'] = z_linear[0] + z_linear[1] * df['Duration']
产量
Crncy Spread Duration Adj_EUR Adj_USD0 欧元 100 1.2 12 221 南南南 0.0 0.0...
任何关于如何做到这一点的线索?
非常值得
假设我们有 df1
和 df2
将 df1
和 df2
转换成相似的数据帧
df1['Adj_EUR'] = np.nandf1['Adj_USD'] = np.nandf1['左'] = 1>>>df1Crncy 点差持续时间 Adj_EUR Adj_USD 剩余0 欧元 100 1.2 NaN NaN 11 瑞士法郎 200 2.5 NaN NaN 1df2 = df2.rename(columns={'CRNCY': 'Crncy', 'TENOR': 'Duration'})df2['Spread'] = np.nandf2['左'] = 0>>>df2Crncy Duration Adj_EUR Adj_USD 剩余点差0 欧元 1 10 20 NaN 01 欧元 2 20 30 NaN 02 欧元 5 30 40 NaN 03 欧元 7 40 50 NaN 04 瑞士法郎 1 50 10 NaN 05 瑞士法郎 2 60 20 纳米 06 瑞士法郎 5 70 30 纳米 0
现在连接 df1
和 df2
行方向.
df3 = pd.concat([df1, df2], ignore_index=True, sort=False).sort_values(['Crncy', 'Duration'])>>>df3Crncy 点差持续时间 Adj_EUR Adj_USD 剩余6 瑞士法郎 NaN 1.0 50.0 10.0 07 瑞士法郎 NaN 2.0 60.0 20.0 01 瑞士法郎 200.0 2.5 NaN NaN 18 瑞士法郎 NaN 5.0 70.0 30.0 02 欧元 NaN 1.0 10.0 20.0 00 欧元 100.0 1.2 NaN NaN 13 欧元 NaN 2.0 20.0 30.0 04 欧元 NaN 5.0 30.0 40.0 05 欧元 NaN 7.0 40.0 50.0 0
然后使用Duration
对每列的NaN
值进行插值,然后删除不需要的列:
df3 = df3.set_index('持续时间')df4 = df3.groupby(['Crncy']).apply(lambda x: x.interpolate(method='index')).reset_index()df4 = df4[['Crncy', 'Spread', 'Duration', 'Adj_EUR', 'Adj_USD', 'left']]df4 = df4.loc[df4['left'] == 1].drop('left',axis=1).reset_index(drop=True)>>>df4Crncy 点差久期 Adj_EUR Adj_USD0 瑞士法郎 200.0 2.5 61.666667 21.6666671 欧元 100.0 1.2 12.000000 22.000000
希望这会有所帮助.
I have a DataFrame
on which I would like to somehow add new columns based on the value of a specific column, whose result depends on data contained in another DataFrame
.
More specifically, I have
df_original =
Crncy Spread Duration
0 EUR 100 1.2
1 nan nan nan
2 100 3.46
3 CHF 200 2.5
4 USD 50 5.0
...
df_interpolation =
CRNCY TENOR Adj_EUR Adj_USD
0 EUR 1 10 20
1 EUR 2 20 30
2 EUR 5 30 40
3 EUR 7 40 50
...
10 CHF 1 50 10
11 CHF 2 60 20
12 CHF 5 70 30
...
and would now like to add the columns Adj_EUR
and Adj_USD
to df_original
for each row, based on the value of Crncy
and Duration
using standard linear interpolation.
So, we want to use TENOR
and Adj_USD
/Adj_EUR
from df_interpolation
and Duration
from df_original
, for each available Crncy
, to form the interpolation.
E.g. Pseudo-code using optimize
-package from scipy
:
from scipy import optimize
""" Do this for both 'Adj_EUR' and 'Adj_USD' """
# For 'Adj_EUR'
for curr, df in df_original.groupby('Crncy'):
x_data = df_interpolation[df_interpolation['CRNCY']==curr].as_matrix(['TENOR'])
y_data = df_interpolation[df_interpolation['CRNCY']==curr].as_matrix(['Adj_EUR'])
""" Linear fit """
z_linear = optimize.curve_fit(lambda t,a,b: a + b * t, x_data.ravel(), y_data.ravel())[0]
""" Somehow add the values back to df_original in a new column """
df['Adj_EUR'] = z_linear[0] + z_linear[1] * df['Duration']
Yielding
Crncy Spread Duration Adj_EUR Adj_USD
0 EUR 100 1.2 12 22
1 nan nan nan 0.0 0.0
...
Any clue on how to do this?
Much appreciable
Suppose we have df1
and df2
>>> df1
Crncy Spread Duration
0 EUR 100 1.2
1 CHF 200 2.5
>>> df2
CRNCY TENOR Adj_EUR Adj_USD
0 EUR 1 10 20
1 EUR 2 20 30
2 EUR 5 30 40
3 EUR 7 40 50
4 CHF 1 50 10
5 CHF 2 60 20
6 CHF 5 70 30
Transform df1
and df2
into similar dataframes
df1['Adj_EUR'] = np.nan
df1['Adj_USD'] = np.nan
df1['left'] = 1
>>> df1
Crncy Spread Duration Adj_EUR Adj_USD left
0 EUR 100 1.2 NaN NaN 1
1 CHF 200 2.5 NaN NaN 1
df2 = df2.rename(columns={'CRNCY': 'Crncy', 'TENOR': 'Duration'})
df2['Spread'] = np.nan
df2['left'] = 0
>>> df2
Crncy Duration Adj_EUR Adj_USD Spread left
0 EUR 1 10 20 NaN 0
1 EUR 2 20 30 NaN 0
2 EUR 5 30 40 NaN 0
3 EUR 7 40 50 NaN 0
4 CHF 1 50 10 NaN 0
5 CHF 2 60 20 NaN 0
6 CHF 5 70 30 NaN 0
Now concat df1
and df2
row direction.
df3 = pd.concat([df1, df2], ignore_index=True, sort=False).sort_values(['Crncy', 'Duration'])
>>> df3
Crncy Spread Duration Adj_EUR Adj_USD left
6 CHF NaN 1.0 50.0 10.0 0
7 CHF NaN 2.0 60.0 20.0 0
1 CHF 200.0 2.5 NaN NaN 1
8 CHF NaN 5.0 70.0 30.0 0
2 EUR NaN 1.0 10.0 20.0 0
0 EUR 100.0 1.2 NaN NaN 1
3 EUR NaN 2.0 20.0 30.0 0
4 EUR NaN 5.0 30.0 40.0 0
5 EUR NaN 7.0 40.0 50.0 0
And then interpolate NaN
values of each column using Duration
, and then drop unnecessary columns:
df3 = df3.set_index('Duration')
df4 = df3.groupby(['Crncy']).apply(lambda x: x.interpolate(method='index')).reset_index()
df4 = df4[['Crncy', 'Spread', 'Duration', 'Adj_EUR', 'Adj_USD', 'left']]
df4 = df4.loc[df4['left'] == 1].drop('left', axis=1).reset_index(drop=True)
>>> df4
Crncy Spread Duration Adj_EUR Adj_USD
0 CHF 200.0 2.5 61.666667 21.666667
1 EUR 100.0 1.2 12.000000 22.000000
Hope this helps.
这篇关于基于另一个 DataFrame 在 DataFrame 上应用插值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!