从拟合的lm或glm中获取每个因子水平(以及相互作用)的数据数量[R] [英] Get number of data in each factor level (as well as interaction) from a fitted lm or glm [R]

查看:189
本文介绍了从拟合的lm或glm中获取每个因子水平(以及相互作用)的数据数量[R]的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在R中有一个逻辑回归模型,其中所有的预测变量都是分类的而不是连续的(除了响应变量也很明显是分类/二进制的).

I have a logistic regression model in R, where all of the predictor variables are categorical rather than continuous (in addition to the response variable, which is also obviously categorical/binary).

调用summary(model_name)时,是否有一种方法可以包含表示每个因子水平内观察次数的列?

When calling summary(model_name), is there a way to include a column representing the number of observations within each factor level?

推荐答案

我在R中有一个逻辑回归模型,其中所有的预测变量都是分类的而不是连续的.

I have a logistic regression model in R, where all of the predictor variables are categorical rather than continuous.

如果所有协变量都是因子(不包括截距),则这非常容易,因为模型矩阵仅包含0和1,数字1表示数据中该因子水平(或相互作用水平)的出现.因此,只需colSums(model.matrix(your_glm_model_object)).

If all your covariates are factors (not including the intercept), this is fairly easy as the model matrix only contains 0 and 1 and the number of 1 indicates the occurrence of that factor level (or interaction level) in your data. So just do colSums(model.matrix(your_glm_model_object)).

由于模型矩阵具有列名,因此colSums将为您提供一个具有名称"属性的向量,该属性与coef(your_glm_model_object)的名称"字段一致.

Since a model matrix has column names, colSums will give you a vector with "names" attribute, that is consistent with the "names" field of coef(your_glm_model_object).

对于任何分布族,相同的解决方案适用于线性模型(按lm)和广义线性模型(按glm).

The same solution applies to a linear model (by lm) and a generalized linear model (by glm) for any distribution family.

这是一个简单的例子:

set.seed(0)
f1 <- sample(gl(2, 50))  ## a factor with 2 levels, each with 50 observations
f2 <- sample(gl(4, 25))  ## a factor with 4 levels, each with 25 observations
y <- rnorm(100)
fit <- glm(y ~ f1 * f2)  ## or use `lm` as we use `guassian()` family object here
colSums(model.matrix(fit))
#(Intercept)         f12         f22         f23         f24     f12:f22 
#        100          50          25          25          25          12 
#    f12:f23     f12:f24 
#         12          14 

在这里,我们有100个观察值/完整案例(在(Intercept)下指示).

Here, we have 100 observations / complete-cases (indicated under (Intercept)).

有没有一种方法可以显示每个因素的基线水平的计数?

Is there a way to display the count for the baseline level of each factor?

基线水平是对比的,因此它们不会出现在用于拟合的模型矩阵中.但是,我们可以根据公式而不是拟合的模型生成完整的模型矩阵(不带对比度)(如果模型中包含数字变量,这也为您提供了一种删除数字变量的方法): >

Baseline levels are contrasted, so they don't appear in the the model matrix used for fitting. However, we can generate the full model matrix (without contrasts) from your formula not your fitted model (this also offers you a way to drop numeric variables if you have them in your model):

SET_CONTRAST <- list(f1 = contr.treatment(nlevels(f1), contrast = FALSE),
                     f2 = contr.treatment(nlevels(f2), contrast = FALSE))
X <- model.matrix(~ f1 * f2, contrasts.arg = SET_CONTRAST)
colSums(X)
#(Intercept)         f11         f12         f21         f22         f23 
#        100          50          50          25          25          25 
#        f24     f11:f21     f12:f21     f11:f22     f12:f22     f11:f23 
#         25          13          12          13          12          13 
#    f12:f23     f11:f24     f12:f24 
#         12          11          14 

请注意,当您有许多因子变量时,在设置对比度时会很快变得乏味.

Note that it can quickly become tedious in setting contrasts when you have many factor variables.

model.matrix绝对不是唯一的方法.常规方式可能是

model.matrix is definitely not the only approach for this. The conventional way may be

table(f1)
table(f2)
table(f1, f2)

但是当您的模型变得复杂时,它也会变得乏味.

but could get tedious too when your model become complicated.

这篇关于从拟合的lm或glm中获取每个因子水平(以及相互作用)的数据数量[R]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆