R中的逻辑单元固定效应模型 [英] Logistic Unit Fixed Effect Model in R
问题描述
我正在尝试使用R估算面板数据的逻辑单元固定效应模型.我的因变量是二进制的,并且在两年中每天针对13个位置进行测量.该模型的目的是根据x预测特定日期和位置的y值.
I'm trying to estimate a logistic unit fixed effects model for panel data using R. My dependent variable is binary and measured daily over two years for 13 locations. The goal of this model is to predict the value of y for a particular day and location based on x.
zero <- seq(from=0, to=1, by=1)
ids = dplyr::data_frame(location=seq(from=1, to=13, by=1))
dates = dplyr::data_frame(date = seq(as.Date("2015-01-01"), as.Date("2016-12-31"), by="days"))
data = merge(dates, ids)
data$y <- sample(zero, size=9503, replace=TRUE)
data$x <- sample(zero, size=9503, replace=TRUE)
虽然调查了可用的软件包,但我已经阅读了许多(显然)做到这一点的方法,但是我不确定我是否了解软件包和方法之间的区别.
While surveying the available packages to do so, I've read a number of ways to (apparently) do this, but I'm not confident I've understood the differences between packages and approaches.
到目前为止,我读到的 glm()
, survival :: clogit()
和 pglm :: pglm()
被用来做到这一点,但我想知道软件包之间是否有实质性的区别,可能会有什么不同.这是我使用过的电话:固定的<-glm(y〜x + factor(location),data = data)
已修复<-clogit(y〜x +地层(位置),data = data)
From what I have read so far, glm()
, survival::clogit()
and pglm::pglm()
can be used to do this, but I'm wondering if there are substantial differences between the packages and what those might be.
Here are the calls I've used:
fixed <- glm(y ~ x + factor(location), data=data)
fixed <- clogit(y ~ x + strata(location), data=data)
这种不安全的原因之一是我在使用pglm时遇到的错误(另请参见已修复<-pglm(y〜x,data = data,index = c("location","date"),model ="within",family = binomial("logit")).
One of the reasons for this insecurity is the error I get when using pglm (also see this question) that pglm
can't use the "within" model:
fixed <- pglm(y ~ x, data=data, index=c("location", "date"), model="within", family=binomial("logit"))
.
将pglm的内部"模型与 glm()
和 clogit()
中的方法区分开来的是什么,这三者中哪一个是正确的?尝试预测给定日期和单位的y时?
What distinguishes the "within" model of pglm from the approaches in glm()
and clogit()
and which of the three would be the correct one to take here when trying to predict y for a given date and unit?
推荐答案
我没有看到您定义了正确的假设以在所谓的面板数据"的上下文中进行测试,但是就得到了<可以通过添加family ="binomial"并通过"unit"变量进行分层来实现对代码> glm 进行分层内逻辑系数的估计:
I don't see that you have defined a proper hypothesis to test within the context of what you are calling "panel data", but as far as getting glm
to give estimates for logistic coefficients within strata it can be accomplished by adding family="binomial" and stratifying by your "unit" variable:
> fixed <- glm(y ~ x + strata(unit), data=data, family="binomial")
> fixed
Call: glm(formula = y ~ x + strata(unit), family = "binomial", data = data)
Coefficients:
(Intercept) x strata(unit)unit=2 strata(unit)unit=3
0.10287 -0.05910 -0.08302 -0.03020
strata(unit)unit=4 strata(unit)unit=5 strata(unit)unit=6 strata(unit)unit=7
-0.06876 -0.05042 -0.10200 -0.09871
strata(unit)unit=8 strata(unit)unit=9 strata(unit)unit=10 strata(unit)unit=11
-0.09702 0.02742 -0.13246 -0.04816
strata(unit)unit=12 strata(unit)unit=13
-0.11449 -0.16986
Degrees of Freedom: 9502 Total (i.e. Null); 9489 Residual
Null Deviance: 13170
Residual Deviance: 13170 AIC: 13190
那将不考虑任何日期顺序,这正是我所期望的.但是,正如我在上面说的那样,似乎还没有基于任何顺序的假设.
That will not take into account any date-ordering, which is what I would have expected to be the interest. But as I said above, there doesn't yet appear to be a hypothesis that is premised on any sequential ordering.
这将创建一个固定效果模型,其中包括 date
与y事件概率的样条关系.我选择将日期居中,而不是将其保留为非常大的整数:
This would create a fixed effects model that included a spline relationship of date
to probability of y-event. I chose to center the date rather than leaving it as a very large integer:
library(splines)
fixed <- glm(y ~ x + ns(scale(date),3) + factor(unit), data=data, family="binomial")
fixed
#----------------------
Call: glm(formula = y ~ x + ns(scale(date), 3) + factor(unit), family = "binomial",
data = data)
Coefficients:
(Intercept) x ns(scale(date), 3)1 ns(scale(date), 3)2
0.13389 -0.05904 0.04431 -0.10727
ns(scale(date), 3)3 factor(unit)2 factor(unit)3 factor(unit)4
-0.03224 -0.08302 -0.03020 -0.06877
factor(unit)5 factor(unit)6 factor(unit)7 factor(unit)8
-0.05042 -0.10201 -0.09872 -0.09702
factor(unit)9 factor(unit)10 factor(unit)11 factor(unit)12
0.02742 -0.13246 -0.04816 -0.11450
factor(unit)13
-0.16987
Degrees of Freedom: 9502 Total (i.e. Null); 9486 Residual
Null Deviance: 13170
Residual Deviance: 13160 AIC: 13200
这篇关于R中的逻辑单元固定效应模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!