Pandas DataFrame:复杂线性插值 [英] Pandas DataFrame: Complex linear interpolation

查看:664
本文介绍了Pandas DataFrame:复杂线性插值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含4个部分的数据框

I have a dataframe with 4 sections

第1部分:产品详细信息

Section 1: Product details

第2节:6种基于一系列模拟的潜在乘积值

Section 2: 6 Potential product values based on a range of simulations

第3节:模拟输入参数的上限和下限

Section 3: Upper and lower bound for the input parameter to the simulations

第4部分:为输入参数随机生成的值

Section 4: Randomly generated values for the input parameters

第2节是通过在上限和下限之间相等的时间间隔对产品进行定价来生成的.

Section 2 is generated by pricing the product at equal intervals between the upper and lower bound.

我需要获取第4节中的值,并找出相应的产品值.这是此数据框的可能设置:

I need to take the values in Section 4 and figure out the corresponding product value. Here is a possible setup for this dataframe:

table2 = pd.DataFrame({
        'Product Type': ['A', 'B', 'C', 'D'],
        'State_1_Value': [10, 11, 12, 13],
    'State_2_Value': [20, 21, 22, 23],
    'State_3_Value': [30, 31, 32, 33],
    'State_4_Value': [40, 41, 42, 43],
    'State_5_Value': [50, 51, 52, 53],
    'State_6_Value': [60, 61, 62, 63],
    'Lower_Bound': [-1, 1, .5, 5],
    'Upper_Bound': [1, 2, .625, 15],
    'sim_1': [0, 0, .61, 7],
    'sim_2': [1, 1.5, .7, 9],
    })

>>> table2
   Lower_Bound Product Type  State_1_Value  State_2_Value  State_3_Value  \
0         -1.0            A             10             20             30   
1          1.0            B             11             21             31   
2          0.5            C             12             22             32   
3          5.0            D             13             23             33   

   State_4_Value  State_5_Value  State_6_Value  Upper_Bound  sim_1  sim_2  
0             40             50             60        1.000    0.0    1.0  
1             41             51             61        2.000    0.0    1.5  
2             42             52             62        0.625    0.61    0.7  
3             43             53             63       15.000    7.0    9.0  

我将通过几个计算示例来弄清楚我的问题是什么.

I will run through a couple examples of this calculation to make it clear what my question is.

产品A-sim_2 此处的输入为1.0.这等于该产品的上限.因此,模拟值等于state_6的值-60

Product A - sim_2 The input here is 1.0. This is equal to the upper bound for this product. Therefore the simulation value is equivalent to the state_6 value - 60

产品B-sim_2 输入为1.5. LB到UB的范围是(1,2),因此6个状态是{1,1.2,1.4,1.6,1.8,2}. 1.5恰好位于state_3(其值为31)和State 4(其值为41)的中间.因此,模拟值为36.

Product B - sim_2 The input here is 1.5. the LB to UB range is (1,2), therefore the 6 states are {1,1.2,1.4,1.6,1.8,2}. 1.5 is exactly in the middle of state_3 which has a value of 31 and state 4 which has a value of 41. Therefore the simulation value is 36.

产品C-sim_1 此处输入为0.61. LB到UB的范围是(.5,.625),因此6个状态是{.5,.525,.55,.575,.6,.625}. .61在状态5和状态6之间.具体地说,它将落入的存储桶将为5 *(.61-.5)/(.625-.5)+1 = 5.4(将其乘以5即为该数字间隔-您可以通过其他方式进行计算并获得相同的结果).然后,为了计算该值,我们使用该存储桶权衡状态5和状态6的值:(62-52)*(5.4-5)+52 = 56.

Product C - sim_1 The input here is .61. The LB to UB range is (.5,.625), therefore the 6 states are {.5,.525,.55,.575,.6,.625}. .61 is between state 5 and 6. Specifically the bucket it would fall under would be 5*(.61-.5)/(.625-.5)+1 = 5.4 (it is multiplied by 5 as that is the number of intervals - you can calculate it other ways and get the same result). Then to calculate the value we use that bucket in a weighing of the values for state 5 and state 6: (62-52)*(5.4-5)+52 = 56.

产品B-sim_1 此处的输入为0,低于1的下限.因此,我们需要外推该值.我们使用与上述相同的公式,仅使用状态1和状态2的值进行推断.桶将为5 *(0-1)/(2-1)+1 = -4.这两个值分别用于11和21,因此值为(21-11)*(-4-1)+ 11 = -39

Product B - sim_1 The input here is 0 which is below the lower bound of 1. Therefore we need to extrapolate the value. We use the same formula as above we just use the values of state 1 and state 2 to extrapolate. The bucket would be 5*(0-1)/(2-1)+1 = -4. The two values used at 11 and 21, so the value is (21-11)*(-4-1)+11= -39

我还简化了问题以尝试可视化解决方案,我的最终代码需要在500个值和10,000个模拟上运行,并且数据框将包含约200行.

I've also simplified the problem to try to visualize the solution, my final code needs to run on 500 values and 10,000 simulations, and the dataframe will have about 200 rows.

这是我用于插值的公式,尽管我并不专门致力于这些公式.

Here are the formulas I've used for the interpolation although I'm not committed to them specifically.

桶= N *(sim_value-LB)/(UB-LB)+ 1 其中N是间隔数

Bucket = N*(sim_value-LB)/(UB-LB) + 1 where N is the number of intervals

然后,nLower是存储桶正下方的状态值,nHigher是存储桶正上方的状态值.如果存储桶在UB/LB之外,则将nLower和nHigher强制为前两个值或后两个值.

then nLower is the state value directly below the bucket, and nHigher is the state value directly above the bucket. If the bucket is outside the UB/LB, then force nLower and nHigher to be either the first two or last two values.

最终值=(nHigher-nLower)*(桶1-number_value_of_nLower)+ nLower

Final_value = (nHigher-nLower)*(Bucket1 - number_value_of_nLower)+nLower

总而言之,我的问题是如何根据所提供的输入数据的组合来生成最终结果.对我来说最具挑战性的部分是如何使Bucket编号与nLower和nHigher值建立连接.

To summarize, my question is how I can generate the final results based on the combination of input data provided. The most challenging part to me is how to make the connection from the Bucket number to the nLower and nHigher values.

推荐答案

我在这里发布了没有循环的高级解决方案:

I posted a superior solution with no loops here:

避免在熊猫数据框中循环的替代方法

df= pd.DataFrame({
            'Product Type': ['A', 'B', 'C', 'D'],
            'State_1_Value': [10, 11, 12, 13],
        'State_2_Value': [20, 21, 22, 23],
        'State_3_Value': [30, 31, 32, 33],
        'State_4_Value': [40, 41, 42, 43],
        'State_5_Value': [50, 51, 52, 53],
        'State_6_Value': [60, 61, 62, 63],
        'Lower_Bound': [-1, 1, .5, 5],
        'Upper_Bound': [1, 2, .625, 15],
        'sim_1': [0, 0, .61, 7],
        'sim_2': [1, 1.5, .7, 9],
        })


buckets = df.ix[:,-2:].sub(df['Lower_Bound'],axis=0).div(df['Upper_Bound'].sub(df['Lower_Bound'],axis=0),axis=0) * 5 + 1
low = buckets.applymap(int)
high = buckets.applymap(int) + 1
low = low.applymap(lambda x: 1 if x < 1 else x)
low = low.applymap(lambda x: 5 if x > 5 else x)
high = high.applymap(lambda x: 6 if x > 6 else x)
high = high.applymap(lambda x: 2 if x < 2 else x)
low_value = pd.DataFrame(df.filter(regex="State|Type").values[np.arange(low.shape[0])[:,None], low])
high_value = pd.DataFrame(df.filter(regex="State|Type").values[np.arange(high.shape[0])[:,None], high])
df1 = (high_value - low_value).mul((buckets - low).values) + low_value
df1['Product Type'] = df['Product Type']

这篇关于Pandas DataFrame:复杂线性插值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆