使用OLS回归预测未来值(Python,StatsModels,Pandas) [英] Predicting out future values using OLS regression (Python, StatsModels, Pandas)

查看:1132
本文介绍了使用OLS回归预测未来值(Python,StatsModels,Pandas)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在尝试在Python中实现MLR,但不确定如何将找到的系数应用于将来的值.

I'm currently trying to implement a MLR in Python and am not sure how I go about applying the coefficients I've found to future values.

import pandas as pd
import statsmodels.formula.api as sm
import statsmodels.api as sm2

TV = [230.1, 44.5, 17.2, 151.5, 180.8]
Radio = [37.8,39.3,45.9,41.3,10.8]
Newspaper = [69.2,45.1,69.3,58.5,58.4]
Sales = [22.1, 10.4, 9.3, 18.5,12.9]
df = pd.DataFrame({'TV': TV, 
                   'Radio': Radio, 
                   'Newspaper': Newspaper, 
                   'Sales': Sales})

Y = df.Sales
X = df[['TV','Radio','Newspaper']]
X = sm2.add_constant(X)
model = sm.OLS(Y, X).fit()
>>> model.params
const       -0.141990
TV           0.070544
Radio        0.239617
Newspaper   -0.040178
dtype: float64

因此,假设我要为以下DataFrame预测销售":

So let's say I want to predict out "sales" for the following DataFrame:

EDIT

TV     Radio    Newspaper    Sales
230.1  37,8       69.2       22.4
44.5   39.3       45.1       10.1
...    ...        ...        ...
25      15        15
30      20        22
35      22        36

我一直在尝试在这里找到的方法,但似乎无法使它起作用:使用Pandas OLS进行预测

I've been trying a method I found here but I can't seem to get it working: Forecasting using Pandas OLS

谢谢!

推荐答案

假设df2是您新的示例DataFrame:

Assuming df2 is your new out of sample DataFrame:

model = sm.OLS(Y, X).fit()
new_x = df2.loc[df.Sales.notnull(), ['TV', 'Radio', 'Newspaper']].values
new_x = sm2.add_constant(new_x)  # sm2 = statsmodels.api
y_predict = model.predict(new_x)

>>> y_predict
array([ 4.61319034,  5.88274588,  6.15220225])

您可以将结果直接分配给df2,如下所示:

You can assign the results directly to df2 as follows:

df2.loc[:, 'Sales'] = model.predict(new_x)

要使用回归预测来填充原始DataFrame中缺少的Sales值,请尝试:

To fill missing Sales values from the original DataFrame with predictions from your regression, try:

X = df.loc[df.Sales.notnull(), ['TV', 'Radio', 'Newspaper']]
X = sm2.add_constant(X)
Y = df[df.Sales.notnull()].Sales

model = sm.OLS(Y, X).fit()
new_x = df.loc[df.Sales.isnull(), ['TV', 'Radio', 'Newspaper']]
new_x = sm2.add_constant(new_x)  # sm2 = statsmodels.api

df.loc[df.Sales.isnull(), 'Sales'] = model.predict(new_x)

这篇关于使用OLS回归预测未来值(Python,StatsModels,Pandas)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆