Python Statsmodels:OLS 回归器无法预测 [英] Python Statsmodels: OLS regressor not predicting
问题描述
我编写了以下代码,但我无法使预测"方法起作用:
I wrote the following piece of code but I just cannot get the 'predict' method to work:
import statsmodels.api as sm
from statsmodels.formula.api import ols
ols_model = ols('Consumption ~ Disposable_Income', df).fit()
我的df"是一个 Pandas 数据框,列标题为Consumption"和Disposable_Income".例如,当我跑步时,
My 'df' is a pandas dataframe with column headings 'Consumption' and 'Disposable_Income'. When I run, for example,
ols_model.predict([1000.0])
我得到:类型错误:列表索引必须是整数,而不是 str"
I get: "TypeError: list indices must be integers, not str"
例如,当我跑步时
ols_model.predict(df['Disposable_Income'].values)
我得到:IndexError: only integers, slices (:
), ellipsis (...
), numpy.newaxis (None
) 和整数或布尔数组是有效的索引"
I get: "IndexError: only integers, slices (:
), ellipsis (...
), numpy.newaxis (None
) and integer or boolean arrays are valid indices"
我很困惑,因为我认为这两种格式正是文档所说的 - 放入 x 变量的值数组.我到底应该如何使用预测"方法?
I'm very confused because I thought these two formats are precisely what the documentation says - put in an array of values for the x variable. How exactly am I supposed to use the 'predict' method?
这就是我的 df 的样子:
This is how my df look:
推荐答案
由于您使用模型中的公式,因此公式信息也将用于 predict
中 exog 的解释.
Since you work with the formulas in the model, the formula information will also be used in the interpretation of the exog in predict
.
我认为您需要使用具有正确解释变量名称的数据框或字典.
I think you need to use a dataframe or a dictionary with the correct name of the explanatory variable(s).
ols_model.predict({'Disposable_Income':[1000.0]})
或类似的东西
df_predict = pd.DataFrame([[1000.0]], columns=['Disposable_Income'])
ols_model.predict(df_predict)
如果用于预测的完整设计矩阵(包括常数)可用,则另一种选择是避免预测中的公式处理
Another option is to avoid formula handling in predict if the full design matrix for prediction, including constant, is available
AFAIR,这也应该有效:
AFAIR, this should also work:
ols_model.predict([[1, 1000.0]], transform=False)
这篇关于Python Statsmodels:OLS 回归器无法预测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!