来自公式的Statsmodels分类数据(使用 pandas ) [英] Statsmodels Categorical Data from Formula (using pandas)

查看:184
本文介绍了来自公式的Statsmodels分类数据(使用 pandas )的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试完成一项家庭作业,为此,我需要在statsmodels中使用分类变量(由于拒绝像其他所有人一样使用Stata).我花了一些时间阅读Patsy和Statsmodels的文档,但我不太清楚为什么这段代码无法正常工作.我曾尝试将其分解并使用patsy命令创建它,但出现相同的错误.

I am trying to finish up a homework assignment and to do so I need to use categorical variables in statsmodels (due to a refusal to conform to using stata like everyone else). I have spent some time reading through documentation for both Patsy and Statsmodels and I can't quite figure out why this snippet of code isn't working. I have tried breaking them down and creating it with the patsy commands, but come up with the same error.

我目前有:

import numpy as np
import pandas as pd
import statsmodels.formula.api as sm


# This is where I'm getting data
data = pd.read_csv("http://people.stern.nyu.edu/wgreene/Econometrics/bankdata.csv")

# I want to use this form for my regression
form = "C ~ Q1 + Q2 + Q3 + Q4 + Q5 + C(BANK)"

# Do the regression
mod = sm.ols(form, data=data)
reg = mod.fit()
print(reg.summary2())

此代码引发错误:TypeError: 'Series' object is not callable.在statsmodels网站上,有一个非常相似的示例此处很好,我不确定我在做什么和他们在做什么之间的区别.

This code raises an error that says: TypeError: 'Series' object is not callable. There is a very similar example here on the statsmodels website which seems to work fine and I'm not sure what the difference between what I'm doing and what they're doing is.

非常感谢您的帮助.

欢呼

推荐答案

问题是C是DataFrame中列之一的名称,也是表示想要分类变量的有效方式.最简单的解决方法是将列重命名为:

The problem is that C is the name of one of the columns in your DataFrame as well as the patsy way of denoting that you want a categorical variable. The easiest fix would be to just rename the column as such:

data = data.rename_axis({'C': 'C_data'}, axis=1) form = "C_data ~ Q1 + Q2 + Q3 + Q4 + Q5 + C(BANK)"

data = data.rename_axis({'C': 'C_data'}, axis=1) form = "C_data ~ Q1 + Q2 + Q3 + Q4 + Q5 + C(BANK)"

然后对sm.ols的调用将起作用.

Then the call to sm.ols will just work.

错误消息TypeError: 'Series' object is not callable可以解释如下:

The error message TypeError: 'Series' object is not callable can be interpreted as follows:

  • patsy将C解释为数据帧的列.在这种情况下,它将是data['C']
  • 系列
  • 然后括号后面紧跟的事实使statsmodels尝试使用参数BANK调用data['C']作为函数.系列对象未实现__call__方法,因此错误消息为'Series' object is not callable.
  • patsy interprets C as the column of the data frame. In this case it would the Series data['C']
  • Then the fact that this is followed immediately by parenthesis made statsmodels try to call the data['C'] as a function with the argument BANK. Series objects don't implement a __call__ method, hence the error message that the 'Series' object is not callable.

祝你好运!

这篇关于来自公式的Statsmodels分类数据(使用 pandas )的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆