创建“指数平滑"算法.变量- pandas [英] Creating "exponential smoothing" variables - Pandas

查看:95
本文介绍了创建“指数平滑"算法.变量- pandas 的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有ID的数据框,以及由这些ID做出的选择. 替代项(选择项)是一个整数列表:[10、20、30、40]. 注意:使用此列表很重要.我们称之为"choice_list".

I have a data frame with IDs, and choices that have made by those IDs. The alternatives (choices) set is a list of integers: [10, 20, 30, 40]. Note: That's important to use this list. Let's call it 'choice_list'.

这是数据框:

ID  Choice
1   10
1   30
1   10
2   40
2   40
2   40
3   20
3   40
3   10

我想为每个替代项创建一个变量:"10_Var","20_Var","30_Var","40_Var". 在每个ID的第一行,例如,如果第一个选择是"10",那么变量"10_Var"将获得值0.6(某些参数),其他每个变量("20_Var","30_Var", '40_Var')将获得值(1-0.6)/4. 数字4代表替代项的数量.

I want to create a variable for each alternative: '10_Var', '20_Var', '30_Var', '40_Var'. At the first row of each ID, if the first choice was '10' for example, so the variable '10_Var' will get the value 0.6 (some parameter), and each of the other variables ('20_Var', '30_Var', '40_Var') will get the value (1 - 0.6) / 4. The number 4 stands for the number of alternatives.

经过上述步骤后,数据应如何显示:

How should the data look like after the step above:

ID  Choice  10_Var  20_Var  30_Var  40_Var
1   10      0.6     0.1     0.1     0.1
1   30              
1   10              
2   40      0.1     0.1     0.1     0.6
2   40              
2   40              
3   20      0.1     0.6     0.1     0.1
3   40              
3   10              

从第二行开始,依此类推,例如,变量"10_Var"将获得值:((0.6 *上一个值)+(1-0.6)* {1,如果最后一个选择是10,否则为0 }),对于每个变量也是如此.

From the second row and so on, the variable '10_Var' for example, will get the value: ( (0.6 * Previous-value) + (1 - 0.6) * {1 if the last choice was 10, 0 otherwise} ), and so for each variable.

注意:应该为每个ID完成一次.

Note: It should be done for each ID.

预期结果:

ID  Choice  10_Var  20_Var  30_Var  40_Var
1   10      0.6     0.1     0.1     0.1
1   30      0.76    0.06    0.06    0.06
1   10      0.456   0.036   0.436   0.036
2   40      0.1     0.1     0.1     0.6
2   40      0.06    0.06    0.06    0.76
2   40      0.036   0.036   0.036   0.856
3   20      0.1     0.6     0.1     0.1
3   40      0.06    0.76    0.06    0.06
3   10      0.036   0.456   0.036   0.436

推荐答案

与以前的解决方案相比,该解决方案可能更易于理解.不过,它可能会比较慢(需要在大型数据帧上进行测试).

This solution might be easier to understand, compared to previous solutions. It might be slower though (tests on large dataframes are required).

此外,它还根据用户的要求进行了参数化.

Also, it is parametrized, as asked by the user.

import numpy as np
import pandas as pd

# Parameter
P = 0.6

def exp_smooth(g):
    rows = [np.where(choices == g.iloc[0].Choice, P, (1-P)/len(choices))]
    for i in range(len(g) - 1):
        rows.append(rows[-1]*P+(1-P)*np.where(choices == g.iloc[i].Choice, 1, 0))
    return np.array(rows)

df = pd.DataFrame([[1, 10], [1, 30], [1, 10],
                   [2, 40], [2, 40], [2, 40],
                   [3, 20], [3, 40], [3, 10]],
                  columns=('ID', 'Choice'))
choices = np.unique(df.Choice)

var_arr = np.concatenate([exp_smooth(g) for _, g in df.groupby("ID")], axis=0)
var_df = pd.DataFrame(var_arr, columns=[f"var_{c}" for c in choices])
df = pd.concat([df, var_df], axis=1)

这篇关于创建“指数平滑"算法.变量- pandas 的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆