如何在Python statsmodels线性混合效果模型中具有多个组? [英] How to have multiple groups in Python statsmodels linear mixed effects model?

查看:426
本文介绍了如何在Python statsmodels线性混合效果模型中具有多个组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Python statsmodels线性混合效果模型来拟合具有两个随机截距的模型,例如两组.我无法弄清楚如何初始化模型,以便可以执行此操作.

I am trying to use the Python statsmodels linear mixed effects model to fit a model that has two random intercepts, e.g. two groups. I cannot figure out how to initialize the model so that I can do this.

这里是例子.我有如下数据(取自此处):

Here's the example. I have data that looks like the following (taken from here):

subject gender  scenario    attitude    frequency
F1  F   1   pol 213.3
F1  F   1   inf 204.5
F1  F   2   pol 285.1
F1  F   2   inf 259.7
F1  F   3   pol 203.9
F1  F   3   inf 286.9
F1  F   4   pol 250.8
F1  F   4   inf 276.8

我想制作一个具有两种随机效应的线性混合效应模型-一种用于主题组,另一种用于场景组.我正在尝试这样做:

I want to make a linear mixed effects model with two random effects -- one for the subject group and one for the scenario group. I am trying to do this:

import statsmodels.api as sm
model = sm.MixedLM.from_formula("frequency ~ attitude + gender", data, groups=data[['subject', 'scenario']])
result = model.fit()
print result.summary()

我不断收到此错误:

LinAlgError: Singular matrix

它在R中工作正常.当我在R中使用lme4进行基于公式的渲染时,它就很好了:

It works fine in R. When I use lme4 in R with the formula-based rendering it fits just fine:

politeness.model = lmer(frequency ~ attitude + gender + 
        (1|subject)  + (1|scenario), data=politeness)

我不明白为什么会这样.当我使用任意一个随机效果/组时,例如

I don't understand why this is happening. It works when I use any one random effect/group, e.g.

model = sm.MixedLM.from_formula("frequency ~ attitude + gender", data, groups=data['subject'])

然后我得到:

                 Mixed Linear Model Regression Results
===============================================================
Model:                MixedLM   Dependent Variable:   frequency
No. Observations:     83        Method:               REML     
No. Groups:           6         Scale:                850.9456 
Min. group size:      13        Likelihood:           -393.3720
Max. group size:      14        Converged:            Yes      
Mean group size:      13.8                                     
---------------------------------------------------------------
                 Coef.   Std.Err.   z    P>|z|  [0.025   0.975]
---------------------------------------------------------------
Intercept        256.785   15.226 16.864 0.000  226.942 286.629
attitude[T.pol]  -19.415    6.407 -3.030 0.002  -31.972  -6.858
gender[T.M]     -108.325   21.064 -5.143 0.000 -149.610 -67.041
Intercept RE     603.948   23.995                              
===============================================================

或者,如果我这样做:

model = sm.MixedLM.from_formula("frequency ~ attitude + gender", data, groups=data['scenario'])

这是我得到的结果:

              Mixed Linear Model Regression Results
================================================================
Model:               MixedLM    Dependent Variable:    frequency
No. Observations:    83         Method:                REML     
No. Groups:          7          Scale:                 1110.3788
Min. group size:     11         Likelihood:            -402.5003
Max. group size:     12         Converged:             Yes      
Mean group size:     11.9                                       
----------------------------------------------------------------
                 Coef.   Std.Err.    z    P>|z|  [0.025   0.975]
----------------------------------------------------------------
Intercept        256.892    8.120  31.637 0.000  240.977 272.807
attitude[T.pol]  -19.807    7.319  -2.706 0.007  -34.153  -5.462
gender[T.M]     -108.603    7.319 -14.838 0.000 -122.948 -94.257
Intercept RE     182.718    5.502                               
================================================================

我不知道发生了什么.我觉得我在问题统计中缺少一些基础知识.

I have no idea what's going on. I feel like I am missing something foundational in the statistics of the problem.

推荐答案

您尝试使用具有交叉随机效应的模型进行拟合,即,您希望允许跨场景的主题之间保持一致的差异,例如以及各科目的情景之间的一致变化.您可以在statsmodels中使用多个随机效应术语,但是它们必须是嵌套的.拟合交叉(而不是嵌套)随机效果需要更复杂的算法,实际上 statsmodels文档说(截至2016年8月25日,重点已添加):

You are trying to fit a model with crossed random effects, i.e., you want to allow for consistent variation among subjects across scenarios as well as consistent variation among scenarios across subjects. You can use multiple random-effects terms in statsmodels, but they must be nested. Fitting crossed (as opposed to nested) random effects requires more sophisticated algorithms, and indeed the statsmodels documentation says (as of 25 Aug 2016, emphasis added):

当前实施方式的一些局限性在于,它不支持残差上更复杂的结构(它们总是同调的),并且不支持交叉随机效应.我们希望在下一个版本中实现这些功能.

Some limitations of the current implementation are that it does not support structure more complex on the residual errors (they are always homoscedastic), and it does not support crossed random effects. We hope to implement these features for the next release.

据我所知,您的选择是(1)退回到嵌套模型(即,适合该模型,就好像场景嵌套在主题中,反之亦然-或同时尝试两种方法并看看差异是否重要); (2)通过R或通过 rpy2 退回到lme4.

As far as I can see, your choices are (1) fall back to a nested model (i.e. fit the model as though either scenario is nested within subject or vice versa - or try both and see if the difference matters); (2) fall back to lme4, either within R or via rpy2.

与往常一样,您有权全额退款使用statsmodels所支付的款项...

As always, you're entitled to a full refund of the money you paid to use statsmodels ...

这篇关于如何在Python statsmodels线性混合效果模型中具有多个组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆