R中Rate变量的回归 [英] Regression for a Rate variable in R

查看:111
本文介绍了R中Rate变量的回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的任务是开发一个回归模型,研究不同课程的学生入学情况.这是一个非常好的,干净的数据集,其注册计数很好地遵循了Poisson分布.我在R中拟合了模型(同时使用GLM和零膨胀泊松.)产生的残差似乎是合理的.

I was tasked with developing a regression model looking at student enrollment in different programs. This is a very nice, clean data set where the enrollment counts follow a Poisson distribution well. I fit a model in R (using both GLM and Zero Inflated Poisson.) The resulting residuals seemed reasonable.

但是,然后我被指示将学生人数更改为费率",该费率是按学生/school_population计算的(每所学校都有自己的人口.)现在,这不再是一个计数变量,而是一个介于0和1.这被视为程序中的注册比例".

However, I was then instructed to change the count of students to a "rate" which was calculated as students / school_population (Each school has its own population.)) This is now no longer a count variable, but a proportion between 0 and 1. This is considered the "proportion of enrollment" in a program.

这个比率"(学生/人口)不再是泊松,但肯定也不是正常的.因此,我对适当的分布以及随后的表示模型感到迷茫.

This "rate" (students/population) is no longer Poisson, but is certainly not normal either. So, I'm a bit lost as to the appropriate distribution, and subsequent model to represent it.

对数正态分布似乎很好地适合了此速率参数,但是我有很多0值,因此实际上不适合.

A log normal distribution seems to fit this rate parameter well, however I have many 0 values, so it won't actually fit.

关于此新参数的最佳分布形式的任何建议,以及如何在R中对其建模?

Any suggestions on the best form of distribution for this new parameter, and how to model it in R?

谢谢!

推荐答案

如注释中所建议,您可以保留泊松模型并使用偏移量进行建模:

As suggested in the comments you could keep the Poisson model and do it with an offset:

glm(response~predictor1+predictor2+predictor3+ ... + offset(log(population),
     family=poisson,data=...)

或者您也可以使用二项式GLM

Or you could use a binomial GLM, either

glm(cbind(response,pop_size-response) ~ predictor1 + ... , family=binomial,
        data=...)

glm(response/pop_size ~ predictor1 + ... , family=binomial,
        weights=pop_size,
        data=...)

尽管使用较少,但后一种形式有时更方便. 请注意,从泊松切换到二项式通常会改变 从日志到logit的链接功能,尽管您可以根据需要使用family=binomial(link="log")).

The latter form is sometimes more convenient, although less widely used. Be aware that in general switching from Poisson to binomial will change the link function from log to logit, although you can use family=binomial(link="log")) if you prefer.

使用Poisson +偏移量组合更容易建模零通胀(我不确定pscl包(ZIP的最常用方法)是否可以处理偏移量,但我认为确实可以),这将是比零膨胀二项式模型更常见.

Zero-inflation might be easier to model with the Poisson + offset combination (I'm not sure if the pscl package, the most common approach to ZIP, handles offsets, but I think it does), which will be more commonly available than a zero-inflated binomial model.

我认为glmmADMB将执行零膨胀的二项式模型,但我尚未对其进行测试.

I think glmmADMB will do a zero-inflated binomial model, but I haven't tested it.

这篇关于R中Rate变量的回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆