创建“指数平滑"算法.变量- pandas [英] Creating "exponential smoothing" variables - Pandas
问题描述
我有一个带有ID的数据框,以及由这些ID做出的选择. 替代项(选择项)是一个整数列表:[10、20、30、40]. 注意:使用此列表很重要.我们称之为"choice_list".
I have a data frame with IDs, and choices that have made by those IDs. The alternatives (choices) set is a list of integers: [10, 20, 30, 40]. Note: That's important to use this list. Let's call it 'choice_list'.
这是数据框:
ID Choice
1 10
1 30
1 10
2 40
2 40
2 40
3 20
3 40
3 10
我想为每个替代项创建一个变量:"10_Var","20_Var","30_Var","40_Var". 在每个ID的第一行,例如,如果第一个选择是"10",那么变量"10_Var"将获得值0.6(某些参数),其他每个变量("20_Var","30_Var", '40_Var')将获得值(1-0.6)/4. 数字4代表替代项的数量.
I want to create a variable for each alternative: '10_Var', '20_Var', '30_Var', '40_Var'. At the first row of each ID, if the first choice was '10' for example, so the variable '10_Var' will get the value 0.6 (some parameter), and each of the other variables ('20_Var', '30_Var', '40_Var') will get the value (1 - 0.6) / 4. The number 4 stands for the number of alternatives.
经过上述步骤后,数据应如何显示:
How should the data look like after the step above:
ID Choice 10_Var 20_Var 30_Var 40_Var
1 10 0.6 0.1 0.1 0.1
1 30
1 10
2 40 0.1 0.1 0.1 0.6
2 40
2 40
3 20 0.1 0.6 0.1 0.1
3 40
3 10
从第二行开始,依此类推,例如,变量"10_Var"将获得值:((0.6 *上一个值)+(1-0.6)* {1,如果最后一个选择是10,否则为0 }),对于每个变量也是如此.
From the second row and so on, the variable '10_Var' for example, will get the value: ( (0.6 * Previous-value) + (1 - 0.6) * {1 if the last choice was 10, 0 otherwise} ), and so for each variable.
注意:应该为每个ID完成一次.
Note: It should be done for each ID.
预期结果:
ID Choice 10_Var 20_Var 30_Var 40_Var
1 10 0.6 0.1 0.1 0.1
1 30 0.76 0.06 0.06 0.06
1 10 0.456 0.036 0.436 0.036
2 40 0.1 0.1 0.1 0.6
2 40 0.06 0.06 0.06 0.76
2 40 0.036 0.036 0.036 0.856
3 20 0.1 0.6 0.1 0.1
3 40 0.06 0.76 0.06 0.06
3 10 0.036 0.456 0.036 0.436
推荐答案
与以前的解决方案相比,该解决方案可能更易于理解.不过,它可能会比较慢(需要在大型数据帧上进行测试).
This solution might be easier to understand, compared to previous solutions. It might be slower though (tests on large dataframes are required).
此外,它还根据用户的要求进行了参数化.
Also, it is parametrized, as asked by the user.
import numpy as np
import pandas as pd
# Parameter
P = 0.6
def exp_smooth(g):
rows = [np.where(choices == g.iloc[0].Choice, P, (1-P)/len(choices))]
for i in range(len(g) - 1):
rows.append(rows[-1]*P+(1-P)*np.where(choices == g.iloc[i].Choice, 1, 0))
return np.array(rows)
df = pd.DataFrame([[1, 10], [1, 30], [1, 10],
[2, 40], [2, 40], [2, 40],
[3, 20], [3, 40], [3, 10]],
columns=('ID', 'Choice'))
choices = np.unique(df.Choice)
var_arr = np.concatenate([exp_smooth(g) for _, g in df.groupby("ID")], axis=0)
var_df = pd.DataFrame(var_arr, columns=[f"var_{c}" for c in choices])
df = pd.concat([df, var_df], axis=1)
这篇关于创建“指数平滑"算法.变量- pandas 的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!