带有分类变量的 statsmodels 中的聚类标准错误 (Python) [英] Clustered standard errors in statsmodels with categorical variables (Python)

查看:80
本文介绍了带有分类变量的 statsmodels 中的聚类标准错误 (Python)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在使用分类变量和聚类标准误差的 statsmodels 中运行回归.

I want to run a regression in statsmodels that uses categorical variables and clustered standard errors.

我有一个包含机构、治疗、年份和入学列的数据集.治疗是一个哑元,机构是一个字符串,其他的是数字.我已确保删除任何空值.

I have a dataset with columns institution, treatment, year, and enrollment. Treatment is a dummy, institution is a string, and the others are numbers. I've made sure to drop any null values.

df.dropna()    
reg_model = smf.ols("enroll ~ treatment + C(year) + C(institution)", df)
.fit(cov_type='cluster', cov_kwds={'groups': df['institution']})

我得到以下信息:

ValueError:权重和列表的长度不同.

ValueError: The weights and list don't have the same length.

有没有办法解决这个问题,让我的标准错误集群?

Is there a way to fix this so my standard errors cluster?

推荐答案

您需要 cov_type='cluster' 合身.

cov_type 是关键字参数,当关键字用作位置参数时,位置不正确.http://www.statsmodels.org/stable/生成/statsmodels.regression.linear_model.OLS.fit.html

cov_type is a keyword argument and not in the correct position when keywords are used as positional arguments. http://www.statsmodels.org/stable/generated/statsmodels.regression.linear_model.OLS.fit.html

一般来说,当关键字参数用作位置参数时,statsmodels 不保证向后兼容性,即关键字位置在未来版本中可能会发生变化.

In general, statsmodels does not guarantee backwards compatibility when keyword arguments are used as positional arguments, that is keyword positions might change in future versions.

但是,我不明白 ValueError 是从哪里来的.Python 具有非常有用的回溯,在提问时添加完整的回溯或至少显示异常发生位置的最后几行非常有用.

However, I don't understand where the ValueError is coming from. Python has very informative tracebacks, and it is very useful when asking questions to add either the full traceback or at least the last few lines that show where the exception is raised.

这篇关于带有分类变量的 statsmodels 中的聚类标准错误 (Python)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆