将两个数据帧组合成y = mx + b结果的矢量化方法 [英] Vectorized way to combine two dataframes into a y=mx+b result
问题描述
我有两个熊猫数据框.一个是典型的y=mx+b
函数的m
和b
值的时间序列.另一个数据框(可以视为一个系列)是几个不同类别的x
值. (是的,x
保持固定,并且线性参数在这种情况下会发生变化)
I have two pandas dataframes. One is a timeseries of m
and b
values from the typical y=mx+b
function. The other dataframe (could be considered a series) is the x
value for several different categories. (yes, the x
is held fixed, and the linear parameters change in this situation)
我想做的是生成一个新的数据帧,其中索引是formula_df.index
,列是staff.columns
,值= mx+b
,这是通过将staff_df的值乘以formula_df[m]
来得到的并添加formula[b]
.
What I want to do is generate a new dataframe where the index is the formula_df.index
, the columns are the staff.columns
, and the value = mx+b
that comes from multiplying the values of staff_df to the formula_df[m]
and adding formula[b]
.
作为一个具体示例,final_df.loc[pd.IndexSlice['20191204', 'matt']]
为:(22 * 0.90 + 10)
As a concrete example, final_df.loc[pd.IndexSlice['20191204', 'matt']]
would be: (22 * 0.90 + 10)
staff = {"mike": 18, "matt": 22, "dave": 25, "kanad": 15, 'elder':85}
staff_df = pd.DataFrame(data=staff, index = ['measurement'])
staff_df.index.name="evaluation"
the_data = {'m': [.5, .1, .3, .9, 1.2], 'b':[12, 14, 8, 10, 20]}
formula_df = pd.DataFrame(index=pd.date_range(start="20191201", periods=5, freq="d"),
data=the_data)
formula_df.index.name="Date"
即使只是试图使等式的mx
部分失败.我试图做类似formula_df['m']*staff_df
的事情,但是它会带来胡说八道的结果.我想,如果我更好地了解了numpy
,那么该做什么就很清楚了,可惜我不知道.我怀疑这与broadcasting
有关,但我不确定.
Even just trying to make the mx
part of the equation fails. I have tried to do things like formula_df['m']*staff_df
but it gives nonsense result. I suppose if I knew numpy
better, it would be clear what to do, alas I don't. I suspect this involves something about broadcasting
but I'm not sure.
推荐答案
由构造函数按列和索引名创建final_df
DataFrame
,将索引名和数据的第一列转换为numpy数组,然后由 DataFrame.add
:
Create final_df
DataFrame
by constructor by columns and index names and data are converted first column to numpy array, then multiple by DataFrame.mul
and add column by DataFrame.add
:
final_df = pd.DataFrame(data=[staff_df.iloc[0].to_numpy()],
index=formula_df.index,
columns=staff_df.columns)
final_df = final_df.mul(formula_df['m'], axis=0).add(formula_df['b'], axis=0)
print (final_df)
mike matt dave kanad elder
Date
2019-12-01 21.0 23.0 24.5 19.5 54.5
2019-12-02 15.8 16.2 16.5 15.5 22.5
2019-12-03 13.4 14.6 15.5 12.5 33.5
2019-12-04 26.2 29.8 32.5 23.5 86.5 <- 22 * 0.90 + 10 = 29.8
2019-12-05 41.6 46.4 50.0 38.0 122.0
这篇关于将两个数据帧组合成y = mx + b结果的矢量化方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!