统计模型为SPSS提供不同的ANOVA结果 [英] Statsmodels gives different ANOVA results to SPSS

查看:159
本文介绍了统计模型为SPSS提供不同的ANOVA结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我逐渐熟悉Statsmodels,以便将我更复杂的stats完全转移到python上.但是,我非常谨慎,因此我正在用SPSS交叉检查我的结果,只是为了确保我没有犯任何明显的错误.大多数情况下,没有区别,但是我有一个双向ANOVA的示例,该示例在Statsmodels和SPSS中抛出了非常不同的测试统计信息. (相关点:方差分析中的样本量不匹配,因此此处的方差分析可能不是合适的模型.)

I'm getting acquainted with Statsmodels so as to shift my more complicated stats completely over to python. However, I'm being cautious, so I'm cross-checking my results with SPSS, just to make sure I'm not making any obvious blunders. Most of time, there's no difference, but I have one example of a two-way ANOVA that's throwing up very different test statistics in Statsmodels and SPSS. (Relevant point: the sample sizes in the ANOVA are mismatched, so ANOVA may not be the appropriate model here.)

我正在选择我的模型,如下所示:

I'm selecting my model as follows:

import pandas as pd
import scipy as sp
import numpy as np
import statsmodels.api as sm
import seaborn as sns
import statsmodels
import statsmodels.api as sm
from statsmodels.formula.api import ols
import matplotlib.pyplot as plt

Body = pd.read_csv(filepath)

Body = Body.dropna()

Body_lm = ols('Effect ~ C(Fiction) + C(Condition) + C(Fiction)*C(Condition)', data = Body).fit()

table = sm.stats.anova_lm(Body_lm, typ=2)

Statsmodels输出如下:

The Statsmodels output is as below:

                            sum_sq     df           F        PR(>F)
C(Fiction)               278.176684    1.0  307.624463  1.682042e-55
C(Condition)               4.294764    1.0    4.749408  2.971278e-02
C(Fiction):C(Condition)   10.776312    1.0   11.917092  5.970123e-04
Residual                 520.861599  576.0         NaN           NaN

相应的SPSS结果如下:

The corresponding SPSS results are these:

任何人都可以帮助解释差异吗?是否可能在引擎盖下对不相等的样本量进行不同的处理?还是我选择了错误的型号?

Can anyone help explain the difference? Is is perhaps the unequal sample sizes being treated differently under the hood? Or am I choosing the wrong model?

任何帮助表示赞赏!

推荐答案

您应使用总和编码比较变量的均值. 顺便说一句,如果* 使用乘法运算符:

You should use sum coding when comparing the means of the variables. BTW you don't need to specify each variable that are in the interaction term if * multiply operator is used:

:"将其他两列乘积的新列添加到设计矩阵中.
"*"还将包括相乘在一起的各个列.

":" adds a new column to the design matrix with the product of the other two columns.
"*" will also include the individual columns that were multiplied together.

您的模型应为:

Body_lm = ols('Effect ~ C(Fiction, Sum)*C(Condition, Sum)', data = Body).fit()

这篇关于统计模型为SPSS提供不同的ANOVA结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆