The outcome (Binary_outcome) is binary.
- There are repeated measures: each subject's binary response is recorded multiple times within each combination of predictors (see "Dummy dataset" below for structure).
- 一个被试间因素,性别(男/女).
- 一个受试者内部因素,干预(事前/事后).
- One between-subjects factor, Sex (male/female).
- One within-subjects factor, Intervention (pre/post).
- 请注意,可以为一个人 分配12个可能试验.因此,并非所有受试者都参加全部12个试验,而是随机进行了6组试验.
- 试验不是 感兴趣的变量.只是认为,个体内的试验中的观察结果可能更相似,因此,试验也应作为聚类相关性的一种形式来解释.
- Note there are 12 possible trials a person could be assigned. Thus, not every subject is in all 12 trials, but rather a random set of 6 trials.
- Trial is not a variable of interest. It is merely thought that observations within an individual, within a trial could be more alike, and thus Trial should also be accounted for as a form of cluster correlation.
虚拟数据集:显示我的数据的一般结构(尽管这不是实际的数据集):
Dummy dataset: Shows the general structure of my data (although this is not the actual dataset):
structure(list(Subject = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Trial = c("A", "A",
"A", "B", "B", "B", "C", "C", "C", "D", "D", "D", "E", "E", "E",
"F", "F", "F", "G", "G", "G", "E", "E", "E", "D", "D", "D", "A",
"A", "A", "J", "J", "J", "L", "L", "L"), Intervention = c("Pre", "Pre", "Pre", "Pre",
"Pre", "Pre", "Pre", "Pre", "Pre", "Post", "Post", "Post", "Post",
"Post", "Post", "Post", "Post", "Post", "Pre", "Pre", "Pre",
"Pre", "Pre", "Pre", "Pre", "Pre", "Pre", "Post", "Post", "Post",
"Post", "Post", "Post", "Post", "Post", "Post"), Sex = c("Female",
"Female", "Female", "Female", "Female", "Female", "Female", "Female",
"Female", "Female", "Female", "Female", "Female", "Female", "Female",
"Female", "Female", "Female", "Male", "Male", "Male", "Male",
"Male", "Male", "Male", "Male", "Male", "Male", "Male", "Male",
"Male", "Male", "Male", "Male", "Male", "Male"), Binary_outcome = c(1L,
1L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L,
1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L,
1L, 1L, 1L)), class = "data.frame", row.names = c(NA, -36L))
正在使用的当前代码:这是我当前正在使用的代码,但是我不知道是否应该根据数据结构以不同的方式指定随机效果(以下概述在";正确核算相关性").
Current code being used: This is what I'm using currently, but I do not know if I should be specifying the random effects differently based on the structure of the data (outlined below under "Accounting correctly for correlation").
install.packages("lme4")
library(lme4)
logit_model <- glmer(Binary_outcome ~ factor(Sex)*factor(Intervention) +
(1 | Trial) +
(1 | Subject),
data = data01,
family="binomial")
正确计算相关性:这就是我的问题所在.评论/问题:
Accounting correctly for correlation: This is where my question lies. Comments/Questions:
- 我相信 Subject 和 Trial 随机效应都是交叉的(不是嵌套的),因为主体1始终是主体1,而试验A始终是试验A.如果设计是嵌套的,则无法对它们进行重新编号/重新字母化(例如,请参见: Sex , Intervention 和 Sex ** Intervention *),以及 Trial的随机拦截和 Subject 使用
+(1 |试用)+(1 |主题)
.-
+(1 |试用)+(1 |主题)
是否正确地告诉"了|是在人内(试验之内)解释人与人之间相关性的模型,还是需要以其他方式指定?即使我不认为随机效果是嵌套的,也仍然感觉像是存在层次结构".但这可能已经由 +(1 |试用)+(1 |主题)
解释了. - 这些数据似乎是独一无二的,即使在试验中,每个受试者也要进行多次测量(0s/1s).我不确定这对模型拟合的影响.
- 我是否需要进一步告诉模型以区分对象内部和对象之间的固定效果?还是代码提取"?在此自动"上
+(1 |试用)+(1 |主题)
?当您在 lme()
中简单地为 +(1 | Subject)
或 aov()
例如,带有 + Error(Subject)
.这就是为什么我在这里简单地使用 +(1 |试用版)+(1 |主题)
.
- I believe both the Subject and Trial random effects are crossed (not nested), because Subject 1 is always Subject 1, and Trial A is always Trial A. There is no way to re-number/re-letter these as you could if the design were nested (see, e.g.: https://stats.stackexchange.com/questions/228800/crossed-vs-nested-random-effects-how-do-they-differ-and-how-are-they-specified).
- As can be seen above under "Current code being used," I have included the fixed effects of interest (Sex, Intervention, and Sex**Intervention*), and random intercepts for Trial and Subject using
+ (1 | Trial) + (1 | Subject)
.
- Does
+ (1 | Trial) + (1 | Subject)
correctly "tell" the model to account for the correlation within a person, within a trial, or does this need to be specified in another way? Even though I don't think the random effects are nested, it still feels like there's a "hierarchy," but maybe this is already accounted for by + (1 | Trial) + (1 | Subject)
.
- These data seem unique in that, even within a trial, there are multiple measurements (0s/1s) for each subject. I am unsure of the implications of this with regard to the model fitting.
- Do I need to further tell the model to differentiate the within- and between-subjects fixed effects? Or does the code "pick-up" on this "automatically" with
+ (1 | Trial) + (1 | Subject)
? It correctly does this when you simply specify a random intercept for subject in lme()
with + (1 | Subject)
, or aov()
with + Error(Subject)
, for example. This is why I simply used + (1 | Trial) + (1 | Subject)
here.
我正在寻找您的反馈意见,最好是用于确定您的反馈意见的参考资料(文本,经过同行评审的论文).我有关于逻辑回归,更广泛的分类数据分析和混合模型的多篇文章,但是-据我所知-他们都没有把我在这里提出的想法融合在一起.因此,了解对这种情况特别有用的资源是否也会有所帮助.
I am looking for your feedback, and preferably also the reference(s) (texts, peer-reviewed papers) used to determine your feedback. I have multiple texts on logistic regression, broader categorical data analysis, and mixed models, but - as far as I can tell - none of them bring together the ideas I have posed here. Thus, knowing if a resource that is particularly useful to this situation would also be helpful.
推荐答案
(1 | Trial)+(1 | Subject)
是合理的:它指定了试验之间以及受试者之间的差异.效果确实是相反的:如果您只想允许受试者之间的试验之间有差异,则可以使用(1 | Subject/Trial)
;对于试验中受试者之间的差异,您可以使用(1 | Trial/Subject)
.由于每个试验:主题组合有多个观察结果,因此可以使用(1 | Trial)+(1 | Subject)+(1 | Subject:Trial)
允许其他水平的变化,但是我有另一种建议(见下文).
(1|Trial) + (1|Subject)
is reasonable: it specifies variation among trials, and among subjects. The effects are indeed crossed: if you only wanted to allow variation among trials within subjects you'd use (1|Subject/Trial)
; for variation among subjects within trials you'd use (1|Trial/Subject)
. Since you have multiple observations per trial:subject combination you could use (1|Trial) + (1|Subject) + (1|Subject:Trial)
to allow for another level of variation, but I have an alternative suggestion (see below).
我相信与此设计相对应的最大模型是
I believe the maximal model corresponding to this design is
Binary_outcome ~ Sex*Intervention + cor(Trial | Subject) + (1|Trial)
cor()
表示相关矩阵的地方,也就是说,我们没有试图在同一试验中针对每个受试者估算重复测量之间的差异-因为我们没有该信息.在这里,(1 | Trial)
表示所有受试者共有的各个试验之间的差异,而 cor(Trial | Subject)
则表示受试者内部各个试验之间的相关性.但是,尽管尝试确定最大值是一个有用的练习,但在这里不可行,原因有两个:(1)估算各个试验的完整相关矩阵需要(n *(n-1)/2 = 12 * 11/2 =)66个参数,如果没有庞大的数据集和庞大的计算机,这将是不可能的;(2)R中很少有可用的混合模型工具能够灵活地将随机效应约束到相关矩阵( MCMCglmm
做到,而其他一些贝叶斯工具(例如 brms
可能; glmmTMB
可以很容易地扩展,而 lme4
可以被黑...)
Where cor()
expresses a correlation matrix, i.e. we are not trying to estimate the variation across repeated measures within the same trial for each subject — because we don't have that information. Here (1|Trial)
expresses the variation across trials that is common to all subjects, while cor(Trial|Subject)
expresses the correlation across trials within subjects. However, while it's a useful exercise to try to identify what the maximal would be, it's not practical here for two reasons: (1) estimating a full correlation matrix across trials would require (n*(n-1)/2 = 12*11/2 =) 66 parameters, which won't be possible without a giant data set and a giant computer; (2) few of the available mixed-model tools in R provide the flexibility to constrain a random effect to a correlation matrix (MCMCglmm
does, and some of the other Bayesian tools such as brms
might; glmmTMB
could be extended fairly easily, and lme4
could be hacked ...)
- 没有必要对级别"代码进行编码.固定效果(内部-之间-之间)的明确显示
- 缺乏平衡和/或缺乏完整的交叉会降低给定样本量的功效,但这不是问题(这是混合模型方法的一大优势)
- 听起来每个科目的多个观察结果是可以互换的(例如,您可以将它们全部视为来自相同分布的样本,具有相同的期望值,等等):如果您想考虑这一点,则可能是一个例外受试者内观察顺序的排序:试验,例如准确性随时间的趋势).在这种情况下,您最好进行汇总并进行二项式回归-将受试者视为"N次试验中有m次成功".而不是"{1,0,1,1,1,0,0,1}"".
- 对于每个聚类的有效样本量较小(即,如果每个主题的二元观测值总数很少),则需要注意一些技术细节:广泛使用的 Laplace近似值的准确性(由
lme4
、glmmTMB
、INLA
、...使用)可能很差.不幸的是,除了采用贝叶斯方法之外,您这里没有太多选择-自适应高斯-赫尔姆正交积分( lme4
, GLMMadaptive
)很少针对以下问题而实现/可用多种随机效应.
- There is no need to code the "level" of fixed effects (within- vs between-) explicitly
- lack of balance and/or lack of complete crossing will reduce your power for a given sample size, but is not otherwise a problem (this is one of the big advantages of mixed model approaches)
- It sounds like the multiple observations per subject:trial combination are exchangeable (i.e. you can treat them all as samples from the same distribution, with the same expected value etc.: an exception to this would be if you wanted to take account of order of observations within subject:trial, e.g. a trend in accuracy over time). In this case, you're better off aggregating and doing a binomial regression — treating a subject as "m successes out of N trials" rather than "{1,0,1,1,1,0,0,1}".
- For small effective sample sizes per cluster (i.e. if there are a fairly small number of total binary observations per subject), you need to be careful about some of the technical details: the accuracy of the widely used Laplace approximation (used by
lme4
, glmmTMB
, INLA
, ...) may be poor. Unfortunately, other than going Bayesian, you don't have a lot of options here - adaptive Gauss-Hermite quadrature (lme4
, GLMMadaptive
) is rarely implemented/available for problems with multiple random effects.
这篇关于在R:lme4 :: glmer中为逻辑混合模型中的重复测量指定随机效应的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!