来自公式的Statsmodels分类数据(使用 pandas ) [英] Statsmodels Categorical Data from Formula (using pandas)
问题描述
我正在尝试完成一项家庭作业,为此,我需要在statsmodels中使用分类变量(由于拒绝像其他所有人一样使用Stata).我花了一些时间阅读Patsy和Statsmodels的文档,但我不太清楚为什么这段代码无法正常工作.我曾尝试将其分解并使用patsy命令创建它,但出现相同的错误.
I am trying to finish up a homework assignment and to do so I need to use categorical variables in statsmodels (due to a refusal to conform to using stata like everyone else). I have spent some time reading through documentation for both Patsy and Statsmodels and I can't quite figure out why this snippet of code isn't working. I have tried breaking them down and creating it with the patsy commands, but come up with the same error.
我目前有:
import numpy as np
import pandas as pd
import statsmodels.formula.api as sm
# This is where I'm getting data
data = pd.read_csv("http://people.stern.nyu.edu/wgreene/Econometrics/bankdata.csv")
# I want to use this form for my regression
form = "C ~ Q1 + Q2 + Q3 + Q4 + Q5 + C(BANK)"
# Do the regression
mod = sm.ols(form, data=data)
reg = mod.fit()
print(reg.summary2())
此代码引发错误:TypeError: 'Series' object is not callable
.在statsmodels网站上,有一个非常相似的示例此处很好,我不确定我在做什么和他们在做什么之间的区别.
This code raises an error that says: TypeError: 'Series' object is not callable
. There is a very similar example here on the statsmodels website which seems to work fine and I'm not sure what the difference between what I'm doing and what they're doing is.
非常感谢您的帮助.
欢呼
推荐答案
问题是C
是DataFrame中列之一的名称,也是表示想要分类变量的有效方式.最简单的解决方法是将列重命名为:
The problem is that C
is the name of one of the columns in your DataFrame as well as the patsy way of denoting that you want a categorical variable. The easiest fix would be to just rename the column as such:
data = data.rename_axis({'C': 'C_data'}, axis=1)
form = "C_data ~ Q1 + Q2 + Q3 + Q4 + Q5 + C(BANK)"
data = data.rename_axis({'C': 'C_data'}, axis=1)
form = "C_data ~ Q1 + Q2 + Q3 + Q4 + Q5 + C(BANK)"
然后对sm.ols
的调用将起作用.
Then the call to sm.ols
will just work.
错误消息TypeError: 'Series' object is not callable
可以解释如下:
The error message TypeError: 'Series' object is not callable
can be interpreted as follows:
- patsy将
C
解释为数据帧的列.在这种情况下,它将是data['C']
系列
- 然后括号后面紧跟的事实使statsmodels尝试使用参数
BANK
调用data['C']
作为函数.系列对象未实现__call__
方法,因此错误消息为'Series' object is not callable
.
- patsy interprets
C
as the column of the data frame. In this case it would the Seriesdata['C']
- Then the fact that this is followed immediately by parenthesis made statsmodels try to call the
data['C']
as a function with the argumentBANK
. Series objects don't implement a__call__
method, hence the error message that the'Series' object is not callable
.
祝你好运!
这篇关于来自公式的Statsmodels分类数据(使用 pandas )的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!