将两个数据帧组合成y = mx + b结果的矢量化方法 [英] Vectorized way to combine two dataframes into a y=mx+b result

查看:71
本文介绍了将两个数据帧组合成y = mx + b结果的矢量化方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个熊猫数据框.一个是典型的y=mx+b函数的mb值的时间序列.另一个数据框(可以视为一个系列)是几个不同类别的x值. (是的,x保持固定,并且线性参数在这种情况下会发生变化)

I have two pandas dataframes. One is a timeseries of m and b values from the typical y=mx+b function. The other dataframe (could be considered a series) is the x value for several different categories. (yes, the x is held fixed, and the linear parameters change in this situation)

我想做的是生成一个新的数据帧,其中索引是formula_df.index,列是staff.columns,值= mx+b,这是通过将staff_df的值乘以formula_df[m]来得到的并添加formula[b].

What I want to do is generate a new dataframe where the index is the formula_df.index, the columns are the staff.columns, and the value = mx+b that comes from multiplying the values of staff_df to the formula_df[m] and adding formula[b].

作为一个具体示例,final_df.loc[pd.IndexSlice['20191204', 'matt']]为:(22 * 0.90 + 10)

As a concrete example, final_df.loc[pd.IndexSlice['20191204', 'matt']] would be: (22 * 0.90 + 10)

staff = {"mike": 18,  "matt": 22,  "dave": 25, "kanad": 15, 'elder':85}
staff_df = pd.DataFrame(data=staff, index = ['measurement'])
staff_df.index.name="evaluation"


the_data = {'m': [.5, .1, .3, .9, 1.2], 'b':[12, 14, 8, 10, 20]}
formula_df = pd.DataFrame(index=pd.date_range(start="20191201", periods=5, freq="d"),
                         data=the_data)
formula_df.index.name="Date"

即使只是试图使等式的mx部分失败.我试图做类似formula_df['m']*staff_df的事情,但是它会带来胡说八道的结果.我想,如果我更好地了解了numpy,那么该做什么就很清楚了,可惜我不知道.我怀疑这与broadcasting有关,但我不确定.

Even just trying to make the mx part of the equation fails. I have tried to do things like formula_df['m']*staff_df but it gives nonsense result. I suppose if I knew numpy better, it would be clear what to do, alas I don't. I suspect this involves something about broadcasting but I'm not sure.

推荐答案

由构造函数按列和索引名创建final_df DataFrame,将索引名和数据的第一列转换为numpy数组,然后由

Create final_df DataFrame by constructor by columns and index names and data are converted first column to numpy array, then multiple by DataFrame.mul and add column by DataFrame.add:

final_df = pd.DataFrame(data=[staff_df.iloc[0].to_numpy()], 
                        index=formula_df.index, 
                        columns=staff_df.columns)
final_df = final_df.mul(formula_df['m'], axis=0).add(formula_df['b'], axis=0)

print (final_df)
            mike  matt  dave  kanad  elder
Date                                      
2019-12-01  21.0  23.0  24.5   19.5   54.5
2019-12-02  15.8  16.2  16.5   15.5   22.5
2019-12-03  13.4  14.6  15.5   12.5   33.5
2019-12-04  26.2  29.8  32.5   23.5   86.5 <- 22 * 0.90 + 10 = 29.8
2019-12-05  41.6  46.4  50.0   38.0  122.0

这篇关于将两个数据帧组合成y = mx + b结果的矢量化方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆