R中的逻辑单元固定效应模型 [英] Logistic Unit Fixed Effect Model in R

查看:124
本文介绍了R中的逻辑单元固定效应模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用R估算面板数据的逻辑单元固定效应模型.我的因变量是二进制的,并且在两年中每天针对13个位置进行测量.该模型的目的是根据x预测特定日期和位置的y值.

I'm trying to estimate a logistic unit fixed effects model for panel data using R. My dependent variable is binary and measured daily over two years for 13 locations. The goal of this model is to predict the value of y for a particular day and location based on x.

zero <- seq(from=0, to=1, by=1)
ids = dplyr::data_frame(location=seq(from=1, to=13, by=1))
dates = dplyr::data_frame(date = seq(as.Date("2015-01-01"), as.Date("2016-12-31"), by="days"))
data = merge(dates, ids)
data$y <- sample(zero, size=9503, replace=TRUE)
data$x <- sample(zero, size=9503, replace=TRUE)

虽然调查了可用的软件包,但我已经阅读了许多(显然)做到这一点的方法,但是我不确定我是否了解软件包和方法之间的区别.

While surveying the available packages to do so, I've read a number of ways to (apparently) do this, but I'm not confident I've understood the differences between packages and approaches.

到目前为止,我读到的 glm() survival :: clogit() pglm :: pglm()被用来做到这一点,但我想知道软件包之间是否有实质性的区别,可能会有什么不同.这是我使用过的电话:固定的<-glm(y〜x + factor(location),data = data)已修复<-clogit(y〜x +地层(位置),data = data)

From what I have read so far, glm(), survival::clogit() and pglm::pglm() can be used to do this, but I'm wondering if there are substantial differences between the packages and what those might be. Here are the calls I've used: fixed <- glm(y ~ x + factor(location), data=data) fixed <- clogit(y ~ x + strata(location), data=data)

这种不安全的原因之一是我在使用pglm时遇到的错误(另请参见已修复<-pglm(y〜x,data = data,index = c("location","date"),model ="within",family = binomial("logit")).

One of the reasons for this insecurity is the error I get when using pglm (also see this question) that pglm can't use the "within" model: fixed <- pglm(y ~ x, data=data, index=c("location", "date"), model="within", family=binomial("logit")).

将pglm的内部"模型与 glm() clogit()中的方法区分开来的是什么,这三者中哪一个是正确的?尝试预测给定日期和单位的y时?

What distinguishes the "within" model of pglm from the approaches in glm() and clogit() and which of the three would be the correct one to take here when trying to predict y for a given date and unit?

推荐答案

我没有看到您定义了正确的假设以在所谓的面板数据"的上下文中进行测试,但是就得到了<可以通过添加family ="binomial"并通过"unit"变量进行分层来实现对代码> glm 进行分层内逻辑系数的估计:

I don't see that you have defined a proper hypothesis to test within the context of what you are calling "panel data", but as far as getting glm to give estimates for logistic coefficients within strata it can be accomplished by adding family="binomial" and stratifying by your "unit" variable:

> fixed <- glm(y ~ x + strata(unit), data=data, family="binomial")
> fixed

Call:  glm(formula = y ~ x + strata(unit), family = "binomial", data = data)

Coefficients:
        (Intercept)                    x   strata(unit)unit=2   strata(unit)unit=3  
            0.10287             -0.05910             -0.08302             -0.03020  
 strata(unit)unit=4   strata(unit)unit=5   strata(unit)unit=6   strata(unit)unit=7  
           -0.06876             -0.05042             -0.10200             -0.09871  
 strata(unit)unit=8   strata(unit)unit=9  strata(unit)unit=10  strata(unit)unit=11  
           -0.09702              0.02742             -0.13246             -0.04816  
strata(unit)unit=12  strata(unit)unit=13  
           -0.11449             -0.16986  

Degrees of Freedom: 9502 Total (i.e. Null);  9489 Residual
Null Deviance:      13170 
Residual Deviance: 13170    AIC: 13190

那将不考虑任何日期顺序,这正是我所期望的.但是,正如我在上面说的那样,似乎还没有基于任何顺序的假设.

That will not take into account any date-ordering, which is what I would have expected to be the interest. But as I said above, there doesn't yet appear to be a hypothesis that is premised on any sequential ordering.

这将创建一个固定效果模型,其中包括 date 与y事件概率的样条关系.我选择将日期居中,而不是将其保留为非常大的整数:

This would create a fixed effects model that included a spline relationship of date to probability of y-event. I chose to center the date rather than leaving it as a very large integer:

library(splines)
fixed <- glm(y ~ x + ns(scale(date),3) + factor(unit), data=data, family="binomial")
fixed
#----------------------
Call:  glm(formula = y ~ x + ns(scale(date), 3) + factor(unit), family = "binomial", 
    data = data)

Coefficients:
        (Intercept)                    x  ns(scale(date), 3)1  ns(scale(date), 3)2  
            0.13389             -0.05904              0.04431             -0.10727  
ns(scale(date), 3)3        factor(unit)2        factor(unit)3        factor(unit)4  
           -0.03224             -0.08302             -0.03020             -0.06877  
      factor(unit)5        factor(unit)6        factor(unit)7        factor(unit)8  
           -0.05042             -0.10201             -0.09872             -0.09702  
      factor(unit)9       factor(unit)10       factor(unit)11       factor(unit)12  
            0.02742             -0.13246             -0.04816             -0.11450  
     factor(unit)13  
           -0.16987  

Degrees of Freedom: 9502 Total (i.e. Null);  9486 Residual
Null Deviance:      13170 
Residual Deviance: 13160    AIC: 13200

这篇关于R中的逻辑单元固定效应模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆