pandas Dataframe AttributeError:"DataFrame"对象没有属性"design_info" [英] Pandas Dataframe AttributeError: 'DataFrame' object has no attribute 'design_info'

查看:362
本文介绍了 pandas Dataframe AttributeError:"DataFrame"对象没有属性"design_info"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用statsmodels.formula.api OLS实现的predict()功能.当我向函数传递新数据框以获取样本外数据集result.predict(newdf)的预测值时,将返回以下错误:'DataFrame' object has no attribute 'design_info'.这是什么意思,我该如何解决?完整的回溯是:

I am trying to use the predict() function of the statsmodels.formula.api OLS implementation. When I pass a new data frame to the function to get predicted values for an out-of-sample dataset result.predict(newdf) returns the following error: 'DataFrame' object has no attribute 'design_info'. What does this mean and how do I fix it? The full traceback is:

    p = result.predict(newdf)
  File "C:\Python27\lib\site-packages\statsmodels\base\model.py", line 878, in predict
    exog = dmatrix(self.model.data.orig_exog.design_info.builder,
  File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 2088, in __getattr__
    (type(self).__name__, name))
AttributeError: 'DataFrame' object has no attribute 'design_info'

编辑:这是一个可复制的示例.当我腌制然后解开结果对象(在我的实际项目中需要这样做)时,就会出现该错误:

Here is a reproducible example. The error appears to occur when I pickle and then unpickle the result object (which I need to do in my actual project):

import cPickle
import pandas as pd
import numpy as np
import statsmodels.formula.api as sm

df = pd.DataFrame({"A": [10,20,30,324,2353], "B": [20, 30, 10, 1, 2332], "C": [0, -30, 120, 11, 2]})

result = sm.ols(formula="A ~ B + C", data=df).fit()
print result.summary()

test1 = result.predict(df) #works

f_myfile = open('resultobject', "wb")
cPickle.dump(result, f_myfile, 2)
f_myfile.close()
print("Result Object Saved")


f_myfile = open('resultobject', "rb")
model = cPickle.load(f_myfile)

test2 = model.predict(df) #produces error

据我所知,

推荐答案

对熊猫DataFrame进行插入和释放操作不会保存和恢复用户已附加的属性.

Pickling and unpickling of a pandas DataFrame doesn't save and restore attributes that have been attached by a user, as far as I know.

由于公式信息当前与原始设计矩阵的DataFrame一起存储,因此取消选中结果和模型"实例后,该信息会丢失.

Since the formula information is currently stored together with the DataFrame of the original design matrix, this information is lost after unpickling a Results and Model instance.

如果您不使用分类变量和转换,则可以使用patsy.dmatrix构建正确的设计矩阵.我认为以下方法应该有效

If you don't use categorical variables and transformations, then the correct designmatrix can be built with patsy.dmatrix. I think the following should work

x = patsy.dmatrix("B + C", data=df)  # df is data for prediction
test2 = model.predict(x, transform=False)

或直接为预测构建设计矩阵也应该起作用.请注意,我们需要显式添加一个默认情况下公式添加的常量.

or constructing the design matrix for the prediction directly should also work Note we need to explicitly add a constant that the formula adds by default.

from statsmodels.api import add_constant
test2 = model.predict(add_constant(df[["B", "C"]]), transform=False)

如果公式和设计矩阵包含(有状态的)转换和类别变量,则没有原始公式信息就无法方便地构造设计矩阵.在这种情况下,手工构造它并明确进行所有计算很困难,并且失去了使用公式的所有优点.

If the formula and design matrix contain (stateful) transformation and categorical variables, then it's not possible to conveniently construct the design matrix without the original formula information. Constructing it by hand and doing all the calculations explicitly is difficult in this case, and looses all the advantages of using formulas.

唯一真正的解决方案是独立于数据框orig_exog腌制公式信息design_info.

The only real solution is to pickle the formula information design_info independently of the dataframe orig_exog.

这篇关于 pandas Dataframe AttributeError:"DataFrame"对象没有属性"design_info"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆