在R:lme4 :: glmer中为逻辑混合模型中的重复测量指定随机效应 [英] Specifying random effects for repeated measures in logistic mixed model in R: lme4::glmer

查看：113 发布时间：2021/5/30 19:43:12 r logistic-regression lme4 mixed-models panel-data

本文介绍了在R:lme4 :: glmer中为逻辑混合模型中的重复测量指定随机效应的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在寻找反馈，以确定如何正确指定随机效应以说明重复测量设计中的相关性，但是具有多个相关性级别(包括每个预测变量组合的纵向数据 ).结果是二进制的，因此我将拟合逻辑混合模型.我打算使用 lme4 包中的 glmer()函数.如果您想知道这些数据是如何产生的，那么一个例子就是眼动仪:人们的眼睛被跟踪"到了眼睛.持续30秒，例如在不同级别的预测器下，确定它们是否看着屏幕上的某个对象(因此产生二进制结果).

I am looking for feedback to determine how to correctly specify random effects to account for correlation in a repeated measures design, but with multiple levels of correlation (including the data being longitudinal for each combination of predictors). The outcome is binary, so I will be fitting a logistic mixed model. I was planning to use the glmer() function from the lme4 package. If you're wondering how these data arise, one example is from an eye tracker: people's eyes are "tracked" for 30 seconds, e.g., under different levels of the predictors, determining if they looked at a certain object on the screen or not (hence the binary outcome).

研究设计(可以通过处理R中下面虚拟数据集"下的代码来看到):

Study design (which can be seen by processing the code under "Dummy dataset" below in R):

结果( Binary_outcome )是二进制的.
- 有重复措施:每个受试者的二元反应在每种预测变量组合中记录多次(有关结构，请参见下面的虚拟数据集").
- The outcome (Binary_outcome) is binary.
  - There are repeated measures: each subject's binary response is recorded multiple times within each combination of predictors (see "Dummy dataset" below for structure).
  - 一个被试间因素，性别(男/女).
  - 一个受试者内部因素，干预(事前/事后).
  - One between-subjects factor, Sex (male/female).
  - One within-subjects factor, Intervention (pre/post).
  - 请注意，可以为一个人分配12个可能试验.因此，并非所有受试者都参加全部12个试验，而是随机进行了6组试验.
  - 试验不是感兴趣的变量.只是认为，个体内的试验中的观察结果可能更相似，因此，试验也应作为聚类相关性的一种形式来解释.
  - Note there are 12 possible trials a person could be assigned. Thus, not every subject is in all 12 trials, but rather a random set of 6 trials.
  - Trial is not a variable of interest. It is merely thought that observations within an individual, within a trial could be more alike, and thus Trial should also be accounted for as a form of cluster correlation.
  虚拟数据集:显示我的数据的一般结构(尽管这不是实际的数据集):
  
  Dummy dataset: Shows the general structure of my data (although this is not the actual dataset):
```
structure(list(Subject = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Trial = c("A", "A", 
"A", "B", "B", "B", "C", "C", "C", "D", "D", "D", "E", "E", "E", 
"F", "F", "F", "G", "G", "G", "E", "E", "E", "D", "D", "D", "A", 
"A", "A", "J", "J", "J", "L", "L", "L"), Intervention = c("Pre", "Pre", "Pre", "Pre", 
"Pre", "Pre", "Pre", "Pre", "Pre", "Post", "Post", "Post", "Post", 
"Post", "Post", "Post", "Post", "Post", "Pre", "Pre", "Pre", 
"Pre", "Pre", "Pre", "Pre", "Pre", "Pre", "Post", "Post", "Post", 
"Post", "Post", "Post", "Post", "Post", "Post"), Sex = c("Female", 
"Female", "Female", "Female", "Female", "Female", "Female", "Female", 
"Female", "Female", "Female", "Female", "Female", "Female", "Female", 
"Female", "Female", "Female", "Male", "Male", "Male", "Male", 
"Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male", 
"Male", "Male", "Male", "Male", "Male", "Male"), Binary_outcome = c(1L, 
1L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 
1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 
1L, 1L, 1L)), class = "data.frame", row.names = c(NA, -36L))
```
  正在使用的当前代码:这是我当前正在使用的代码，但是我不知道是否应该根据数据结构以不同的方式指定随机效果(以下概述在"；正确核算相关性").
  
  Current code being used: This is what I'm using currently, but I do not know if I should be specifying the random effects differently based on the structure of the data (outlined below under "Accounting correctly for correlation").
```
install.packages("lme4")
library(lme4)

logit_model <- glmer(Binary_outcome ~ factor(Sex)*factor(Intervention) + 
                                (1 | Trial) + 
                                (1 | Subject), 
                     data = data01, 
                     family="binomial")
```
  正确计算相关性:这就是我的问题所在.评论/问题:
  
  Accounting correctly for correlation: This is where my question lies. Comments/Questions:
  - 我相信 Subject 和 Trial 随机效应都是交叉的(不是嵌套的)，因为主体1始终是主体1，而试验A始终是试验A.如果设计是嵌套的，则无法对它们进行重新编号/重新字母化(例如，请参见:
    Sex ， Intervention 和 Sex ** Intervention *)，以及 Trial的随机拦截和 Subject 使用 +(1 |试用)+(1 |主题).
    - +(1 |试用)+(1 |主题)是否正确地告诉"了|是在人内(试验之内)解释人与人之间相关性的模型，还是需要以其他方式指定?即使我不认为随机效果是嵌套的，也仍然感觉像是存在层次结构".但这可能已经由 +(1 |试用)+(1 |主题)解释了.
    - 这些数据似乎是独一无二的，即使在试验中，每个受试者也要进行多次测量(0s/1s).我不确定这对模型拟合的影响.
    - 我是否需要进一步告诉模型以区分对象内部和对象之间的固定效果?还是代码提取"?在此自动"上 +(1 |试用)+(1 |主题)?当您在 lme()中简单地为 +(1 | Subject)或 aov()例如，带有 + Error(Subject).这就是为什么我在这里简单地使用 +(1 |试用版)+(1 |主题).
    - I believe both the Subject and Trial random effects are crossed (not nested), because Subject 1 is always Subject 1, and Trial A is always Trial A. There is no way to re-number/re-letter these as you could if the design were nested (see, e.g.: https://stats.stackexchange.com/questions/228800/crossed-vs-nested-random-effects-how-do-they-differ-and-how-are-they-specified).
    - As can be seen above under "Current code being used," I have included the fixed effects of interest (Sex, Intervention, and Sex**Intervention*), and random intercepts for Trial and Subject using + (1 | Trial) + (1 | Subject).
      
      Does + (1 | Trial) + (1 | Subject) correctly "tell" the model to account for the correlation within a person, within a trial, or does this need to be specified in another way? Even though I don't think the random effects are nested, it still feels like there's a "hierarchy," but maybe this is already accounted for by + (1 | Trial) + (1 | Subject).
      
      These data seem unique in that, even within a trial, there are multiple measurements (0s/1s) for each subject. I am unsure of the implications of this with regard to the model fitting.
      
      Do I need to further tell the model to differentiate the within- and between-subjects fixed effects? Or does the code "pick-up" on this "automatically" with + (1 | Trial) + (1 | Subject)? It correctly does this when you simply specify a random intercept for subject in lme() with + (1 | Subject), or aov() with + Error(Subject), for example. This is why I simply used + (1 | Trial) + (1 | Subject) here.
      
      我正在寻找您的反馈意见，最好是用于确定您的反馈意见的参考资料(文本，经过同行评审的论文).我有关于逻辑回归，更广泛的分类数据分析和混合模型的多篇文章，但是-据我所知-他们都没有把我在这里提出的想法融合在一起.因此，了解对这种情况特别有用的资源是否也会有所帮助.
      
      I am looking for your feedback, and preferably also the reference(s) (texts, peer-reviewed papers) used to determine your feedback. I have multiple texts on logistic regression, broader categorical data analysis, and mixed models, but - as far as I can tell - none of them bring together the ideas I have posed here. Thus, knowing if a resource that is particularly useful to this situation would also be helpful.
      
      推荐答案
      
      (1 | Trial)+(1 | Subject)是合理的:它指定了试验之间以及受试者之间的差异.效果确实是相反的:如果您只想允许受试者之间的试验之间有差异，则可以使用(1 | Subject/Trial)；对于试验中受试者之间的差异，您可以使用(1 | Trial/Subject).由于每个试验:主题组合有多个观察结果，因此可以使用(1 | Trial)+(1 | Subject)+(1 | Subject:Trial)允许其他水平的变化，但是我有另一种建议(见下文).
      
      (1|Trial) + (1|Subject) is reasonable: it specifies variation among trials, and among subjects. The effects are indeed crossed: if you only wanted to allow variation among trials within subjects you'd use (1|Subject/Trial); for variation among subjects within trials you'd use (1|Trial/Subject). Since you have multiple observations per trial:subject combination you could use (1|Trial) + (1|Subject) + (1|Subject:Trial) to allow for another level of variation, but I have an alternative suggestion (see below).
      
      我相信与此设计相对应的最大模型是
      
      I believe the maximal model corresponding to this design is
      
      Binary_outcome ~ Sex*Intervention + cor(Trial | Subject) + (1|Trial)
      
      cor()表示相关矩阵的地方，也就是说，我们没有试图在同一试验中针对每个受试者估算重复测量之间的差异-因为我们没有该信息.在这里，(1 | Trial)表示所有受试者共有的各个试验之间的差异，而 cor(Trial | Subject)则表示受试者内部各个试验之间的相关性.但是，尽管尝试确定最大值是一个有用的练习，但在这里不可行，原因有两个:(1)估算各个试验的完整相关矩阵需要(n *(n-1)/2 = 12 * 11/2 =)66个参数，如果没有庞大的数据集和庞大的计算机，这将是不可能的；(2)R中很少有可用的混合模型工具能够灵活地将随机效应约束到相关矩阵( MCMCglmm 做到，而其他一些贝叶斯工具(例如 brms 可能； glmmTMB 可以很容易地扩展，而 lme4 可以被黑...)
      
      Where cor() expresses a correlation matrix, i.e. we are not trying to estimate the variation across repeated measures within the same trial for each subject — because we don't have that information. Here (1|Trial) expresses the variation across trials that is common to all subjects, while cor(Trial|Subject) expresses the correlation across trials within subjects. However, while it's a useful exercise to try to identify what the maximal would be, it's not practical here for two reasons: (1) estimating a full correlation matrix across trials would require (n*(n-1)/2 = 12*11/2 =) 66 parameters, which won't be possible without a giant data set and a giant computer; (2) few of the available mixed-model tools in R provide the flexibility to constrain a random effect to a correlation matrix (MCMCglmm does, and some of the other Bayesian tools such as brms might; glmmTMB could be extended fairly easily, and lme4 could be hacked ...)
      
      没有必要对级别"代码进行编码.固定效果(内部-之间-之间)的明确显示
      缺乏平衡和/或缺乏完整的交叉会降低给定样本量的功效，但这不是问题(这是混合模型方法的一大优势)
      听起来每个科目的多个观察结果是可以互换的(例如，您可以将它们全部视为来自相同分布的样本，具有相同的期望值，等等):如果您想考虑这一点，则可能是一个例外受试者内观察顺序的排序:试验，例如准确性随时间的趋势).在这种情况下，您最好进行汇总并进行二项式回归-将受试者视为"N次试验中有m次成功".而不是"{1,0,1,1,1,0,0,1}"".
      对于每个聚类的有效样本量较小(即，如果每个主题的二元观测值总数很少)，则需要注意一些技术细节:广泛使用的 Laplace近似值的准确性(由 lme4、glmmTMB、INLA、...使用)可能很差.不幸的是，除了采用贝叶斯方法之外，您这里没有太多选择-自适应高斯-赫尔姆正交积分( lme4 ， GLMMadaptive )很少针对以下问题而实现/可用多种随机效应.
      
      There is no need to code the "level" of fixed effects (within- vs between-) explicitly
      
      lack of balance and/or lack of complete crossing will reduce your power for a given sample size, but is not otherwise a problem (this is one of the big advantages of mixed model approaches)
      
      It sounds like the multiple observations per subject:trial combination are exchangeable (i.e. you can treat them all as samples from the same distribution, with the same expected value etc.: an exception to this would be if you wanted to take account of order of observations within subject:trial, e.g. a trend in accuracy over time). In this case, you're better off aggregating and doing a binomial regression — treating a subject as "m successes out of N trials" rather than "{1,0,1,1,1,0,0,1}".
      
      For small effective sample sizes per cluster (i.e. if there are a fairly small number of total binary observations per subject), you need to be careful about some of the technical details: the accuracy of the widely used Laplace approximation (used by lme4, glmmTMB, INLA, ...) may be poor. Unfortunately, other than going Bayesian, you don't have a lot of options here - adaptive Gauss-Hermite quadrature (lme4, GLMMadaptive) is rarely implemented/available for problems with multiple random effects.
      
      这篇关于在R:lme4 :: glmer中为逻辑混合模型中的重复测量指定随机效应的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在R:lme4 :: glmer中为逻辑混合模型中的重复测量指定随机效应 [英] Specifying random effects for repeated measures in logistic mixed model in R: lme4::glmer

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在R:lme4 :: glmer中为逻辑混合模型中的重复测量指定随机效应 [英] Specifying random effects for repeated measures in logistic mixed model in R: lme4::glmer

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭