从R中的glm获取所有离散状态的系数 [英] Get coefficients of all discrete states from glm in R

查看:56
本文介绍了从R中的glm获取所有离散状态的系数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,其中的离散列具有以下状态

I have a dataframe which has a discrete column with the following states

StateName          PX127857  PX128030  PX100049  PX100330  PX106316  PX115690  PX125484  PX112410 PX100778     
    Support           1          1         8         4         7         5         8         12        13

当我使用

  model<- glm(formula, data = DATAFRAME, family = "binomial")

model $ coefficients返回9个离散状态中仅8个的系数,对于状态PX128030,我没有任何系数

model$coefficients returns the coefficients of only 8 out of the 9 discrete states , For the state PX128030 , i do not get any coefficient

我想我理解为什么会发生这种情况,但是有一种方法可以为PX128030之类的状态返回null或0,以使 model $ coefficients 的顺序和计数与级别(数据框$ column)?

I think i understand why this might be happening but is there a way to return null or 0 for states like PX128030 so that the order and count of model$coefficients is the same as levels(dataframe $column) ?

推荐答案

这确实是一个基本的统计问题.当您将分类变量放入模型中时,您将无法计算每个级别的效果.您需要约束条件才能解决问题.您可以通过多种方式处理该问题,但是R中最常见的是将一个级别指定为参考级别,然后所有其他级别的系数实际上是该级别与参考级别有多大不同的度量.因此,该参考水平的影响不为0,很容易将其与截距的估计捆绑在一起.参考水平通常是因素的第一水平.

This is really a basic stats problem. When you put a categorical variable into a model, you are unable to calculate an effect for each level. You need a constraint to make the problem solvable. You can deal with that in a number of ways, but the most common in R is to assign one level as a reference level and then the coefficients for all the other levels are actually measures of how different that level is from the reference level. So the effect for that reference level is not 0, it's simple tied up in the estimate of the intercept. The reference level is usually the first level of the factor.

我认为所有术语都存储在模型的 xterms 属性中.也许像这样的辅助功能可能有用

I think all the terms are stored in the xterms property of the model. Maybe a helper function like this might be of use

levelvals<-function(m) {
    ml <- m$xlevels
    fv<-lapply(names(ml), function(x) v<-paste(x, ml[[x]],sep=""))
    cf <- coefficients(m)
    r<-lapply(fv, function(v) {structure(cf[v], names=v)})
    names(r)<-names(ml)
    r
}
m<-lm(y~f, dd)
levelvals(m)

但是只要确保您正确解释了参数即可.这些不是每个级别的均值,而是该级别和参考级别之间的均值差异.

But just make sure you are correctly interpreting the parameters. Those are not the means for each level, those are the differences in means between that level and the reference level.

这篇关于从R中的glm获取所有离散状态的系数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆