如何在Python中将总和归零约束添加到GLM? [英] How to add sum to zero constraint to GLM in Python?

查看:407
本文介绍了如何在Python中将总和归零约束添加到GLM?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个使用statsmodel glm函数在Python中建立的模型,但是现在我想将零约束总和添加到模型中.

I have a model set up in Python using the statsmodel glm function but now I want to add a sum to zero constraint to the model.

模型定义如下:

import statsmodels.formula.api as smf
model = smf.glm(formula="A ~ B + C + D", data=data, family=sm.families.Poisson()).fit()

在R中,要添加约束,我将简单地执行以下操作:

In R, to add the constraint, I would simply do something like this:

model <- glm(A ~ B + C + D –1, family=poisson(), data=data, contrasts=list(C="contr.sum", D="contr.sum"))

这将总和添加到C和D的零约束上,但是我不确定如何在Python中实现相同的目标.

That adds the sum to zero constraint to both C and D but I am not sure how to achieve the same in Python.

我已经看到有一个fit_constraint()方法可用,但是我不太确定如何使用它,或者甚至不能正确使用它来实现我的要求.

I have seen that there is a fit_constraint() method available but I am not too sure how to use it or if it is even the right thing to use to achieve what I require.

http://statsmodels.sourceforge.net/devel/generation/statsmodels.genmod.generalized_linear_model.GLM.fit_constrained.html#statsmodels.genmod.generalized_linear_model.GLM.fit_constrained

任何人都可以为应用此约束提供任何建议吗?

Can anyone offer any advice to applying this constraint?

推荐答案

以下是使用高斯族说明fit_constrained的示例,因为我没有很快找到带有分类变量的Poisson示例

Here is an example to illustrate fit_constrained, using Gaussian family since I didn't quickly find a Poisson example with categorical variables

import pandas
import statsmodels.api as sm
from statsmodels.formula.api import glm

url = 'http://www.ats.ucla.edu/stat/data/hsb2.csv'
hsb2 = pandas.read_table(url, delimiter=",")

mod = glm("write ~ C(race) - 1", data=hsb2)
res = mod.fit()
print(res.summary())

约束所有系数加到零

res_c = mod.fit_constrained('C(race)[1] + C(race)[2] + C(race)[3] + C(race)[4] = 0')
print(res_c.summary())

                 Generalized Linear Model Regression Results                  
==============================================================================
Dep. Variable:                  write   No. Observations:                  200
Model:                            GLM   Df Residuals:                      197
Model Family:                Gaussian   Df Model:                            2
Link Function:               identity   Scale:                   1232.08314649
Method:                          IRLS   Log-Likelihood:                -993.41
Date:                Wed, 25 Mar 2015   Deviance:                   2.4149e+05
Time:                        16:42:37   Pearson chi2:                 2.41e+05
No. Iterations:                     1                                         
==============================================================================
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
C(race)[1]     1.0002    221.565      0.005      0.996      -433.260   435.260
C(race)[2]   -41.1814    267.253     -0.154      0.878      -564.988   482.626
C(race)[3]    -6.3498    235.771     -0.027      0.979      -468.453   455.754
C(race)[4]    46.5311    100.184      0.464      0.642      -149.827   242.889
==============================================================================

Model has been estimated subject to linear equality constraints.

约束以逗号分隔,默认为零:

constraints are comma separated and default to equal zero:

res_c2 = mod.fit_constrained('C(race)[1] + C(race)[2], C(race)[3] + C(race)[4]')
print(res_c2.summary())

最后打印

                 Generalized Linear Model Regression Results                  
==============================================================================
Dep. Variable:                  write   No. Observations:                  200
Model:                            GLM   Df Residuals:                      198
Model Family:                Gaussian   Df Model:                            1
Link Function:               identity   Scale:                   1438.99574167
Method:                          IRLS   Log-Likelihood:                -1008.9
Date:                Wed, 25 Mar 2015   Deviance:                   2.8204e+05
Time:                        16:42:37   Pearson chi2:                 2.82e+05
No. Iterations:                     1                                         
==============================================================================
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
C(race)[1]    13.6286    242.003      0.056      0.955      -460.689   487.946
C(race)[2]   -13.6286    242.003     -0.056      0.955      -487.946   460.689
C(race)[3]   -41.6606    111.458     -0.374      0.709      -260.115   176.794
C(race)[4]    41.6606    111.458      0.374      0.709      -176.794   260.115
==============================================================================

Model has been estimated subject to linear equality constraints.

我不确定patsy公式的工作方式,以便在存在多个分类解释变量的情况下不会删除任何级别.

I'm not sure how patsy formulas work so that none of the levels is dropped if there are several categorical explanatory variables.

这篇关于如何在Python中将总和归零约束添加到GLM?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆