如何使用rstanarm以APA样式报告贝叶斯线性(混合)模型? [英] How to report with APA style a Bayesian Linear (Mixed) Models using rstanarm?

查看:203
本文介绍了如何使用rstanarm以APA样式报告贝叶斯线性(混合)模型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在努力按照APA-6的建议报告rstanarm::stan_lmer()的输出.

I'm currently struggling with how to report, following APA-6 recommendations, the output of rstanarm::stan_lmer().

首先,我将在混合型方法中采用混合模型,然后将尝试使用贝叶斯框架进行相同的处理.

First, I'll fit a mixed model within the frequentist approach, then will try to do the same using the bayesian framework.

这是获取数据的可复制代码:

Here's the reproducible code to get the data:

library(tidyverse)
library(neuropsychology)
library(rstanarm)
library(lmerTest)

df <- neuropsychology::personality %>% 
  select(Study_Level, Sex, Negative_Affect) %>% 
  mutate(Study_Level=as.factor(Study_Level),
         Negative_Affect=scale(Negative_Affect)) # I understood that scaling variables is important

现在,让我们以传统"方式拟合线性混合模型,以学习水平(受​​教育年限)作为随机因素,测试性别(男性/女性)对负面影响(负面情绪)的影响.

Now, let's fit a linear mixed model in the "traditional" way to test the impact of Sex (male/female) on Negative Affect (negative mood) with the study level (years of education) as random factor.

fit <- lmer(Negative_Affect ~ Sex + (1|Study_Level), df)
summary(fit)

输出如下:

Linear mixed model fit by REML t-tests use Satterthwaite approximations to degrees of
  freedom [lmerMod]
Formula: Negative_Affect ~ Sex + (1 | Study_Level)
   Data: df

REML criterion at convergence: 3709

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.58199 -0.72973  0.02254  0.68668  2.92841 

Random effects:
 Groups      Name        Variance Std.Dev.
 Study_Level (Intercept) 0.04096  0.2024  
 Residual                0.94555  0.9724  
Number of obs: 1327, groups:  Study_Level, 8

Fixed effects:
              Estimate Std. Error         df t value Pr(>|t|)    
(Intercept)    0.01564    0.08908    4.70000   0.176    0.868    
SexM          -0.46667    0.06607 1321.20000  -7.064 2.62e-12 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
     (Intr)
SexM -0.149

要进行报告,我想说:我们建立了一个线性混合模型,其中以负面影响作为结果变量,以性别作为预测变量,并以随机效应输入研究水平.在该模型中,男性水平导致显着降低的负面影响(β= -0.47,t(1321)=-7.06,p <.001).

To report it, I would say that "we fitted a linear mixed model with negative affect as outcome variable, sex as predictor and study level was entered as a random effect. Within this model, the male level led to a significant decrease of negative affect (beta = -0.47, t(1321)=-7.06, p < .001).

对吗?

然后,让我们尝试使用rstanarm在贝叶斯框架内拟合模型:

Then, let's try to fit the model within a bayesian framework using rstanarm:

fitB <- stan_lmer(Negative_Affect ~ Sex + (1|Study_Level),
                  data=df,
                  prior=normal(location=0, scale=1), 
                  prior_intercept=normal(location=0, scale=1),
                  prior_PD=F)
print(fitB, digits=2)

这将返回:

stan_lmer
 family:  gaussian [identity]
 formula: Negative_Affect ~ Sex + (1 | Study_Level)
------

Estimates:
            Median MAD_SD
(Intercept)  0.02   0.10 
SexM        -0.47   0.07 
sigma        0.97   0.02 

Error terms:
 Groups      Name        Std.Dev.
 Study_Level (Intercept) 0.278   
 Residual                0.973   
Num. levels: Study_Level 8 

Sample avg. posterior predictive 
distribution of y (X = xbar):
         Median MAD_SD
mean_PPD 0.00   0.04  

------
For info on the priors used see help('prior_summary.stanreg').

我认为median是系数的后验分布的中位数,而mad_sd是标准偏差的等效项.这些参数接近常人模型的beta和标准误差,这令人放心.但是,我不知道如何将输出形式化并用文字表达.

I think than median is the median of the posterior distribution of the coefficient and mad_sd the equivalent of standart deviation. These parameters are close to the beta and standart error of the frequentist model, which is reassuring. However, I do not know how to formalize and put the output in words.

此外,如果我对模型(summary(fitB, probs=c(.025, .975), digits=2))进行汇总,则会得到后验分布的其他特征:

Moreover, if I do the summary of the model (summary(fitB, probs=c(.025, .975), digits=2)), I get other features of the posterior distribution:

...
Estimates:
                                             mean     sd       2.5%     97.5% 
(Intercept)                                    0.02     0.11    -0.19     0.23
SexM                                          -0.47     0.07    -0.59    -0.34
...

像下面的东西好吗?

"我们在贝叶斯框架内拟合了线性混合模型,其中负影响作为结果变量,性别作为预测变量,研究水平作为随机影响输入.系数和截距的先验值设为正常(平均值= 0, sd = 1).在此模型中,与男性水平相关的系数的后验分布特征表明负面影响有所减少(平均值= -0.47,sd = 0.11,95%CI [-0.59,-0.34])

"we fitted a linear mixed model within the bayesian framework with negative affect as outcome variable, sex as predictor and study level was entered as a random effect. Priors for the coefficient and the intercept were set to normal (mean=0, sd=1). Within this model, the features of the posterior distribution of the coefficient associated with the male level suggest a decrease of negative affect (mean = -0.47, sd = 0.11, 95% CI[-0.59, -0.34]).

感谢您的帮助.

推荐答案

以下是心理学期刊可能接受或不接受的个人观点.

The following is personal opinion that may or may not be acceptable to a psychology journal.

要报告它,我会说我们拟合了线性混合模型,其中负效应作为结果变量,性别作为预测因子,而研究水平则作为随机效应输入.在此模型中,男性水平导致负面影响显着降低(β= -0.47,t(1321)=-7.06,p <.001).

To report it, I would say that "we fitted a linear mixed model with negative affect as outcome variable, sex as predictor and study level was entered as a random effect. Within this model, the male level led to a significant decrease of negative affect (beta = -0.47, t(1321)=-7.06, p < .001).

对吗?

从常客的角度来看,这被认为是正确的.

That is considered correct from a frequentist perspective.

从贝叶斯角度来看,关键概念是(当然,要以模型为条件)

The key concepts from a Bayesian perspective are that (conditional on the model, of course)

  1. 真实效果小于后中位数的概率为0.5,而真实效果大于后中位数的概率为0.5.频繁的人倾向于将后验中位数看作是数值上的最优值.
  2. posterior_interval函数在中位数附近产生可信区间,默认概率为0.9(尽管数字越小,边界的估计值越准确).因此,可以合理地说,真实效果出现在这些界限之间的概率为0.9.频繁的人倾向于将置信区间视为可信区间.
  3. as.data.frame功能将使您能够访问原始抽奖,因此mean(as.data.frame(fitB)$male > 0)得出在同一项研究中,男性和女性的预期结果差异为正的可能性.频繁的人倾向于将这些概率视为p值.
  1. There is a 0.5 probability that the true effect is less than the posterior median and a 0.5 probability that the true effect is greater than the posterior median. Frequentists tend to see a posterior median as being like a numerical optimum.
  2. The posterior_interval function yields credible intervals around the median with a default probability of 0.9 (although a lower number produces more accurate estimates of the bounds). So, you can legitimately say that there is a probability of 0.9 that the true effect is between those bounds. Frequentists tend to see confidence intervals as being like credible intervals.
  3. The as.data.frame function will give you access to the raw draws, so mean(as.data.frame(fitB)$male > 0) yields the probability that the expected difference in the outcome between men and women in the same study is positive. Frequentists tend to see these probabilities as being like p-values.

对于贝叶斯方法,我会说

For a Bayesian approach, I would say

我们使用马尔可夫链蒙特卡罗方法拟合线性模型,其中负面影响作为结果变量,性别作为预测变量,并且截距因研究水平而异.

We fit a linear model using Markov Chain Monte Carlo with negative affect as the outcome variable, sex as predictor and the intercept was allowed to vary by study level.

然后使用上面的三个概念来讨论估计值.

And then talk about the estimates using the three concepts above.

这篇关于如何使用rstanarm以APA样式报告贝叶斯线性(混合)模型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆