如何在R中的多维面板数据上运行回归 [英] How to run regressions on multidimensional panel data in R

查看:478
本文介绍了如何在R中的多维面板数据上运行回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要对面板数据进行回归分析.它具有3个维度(年份*公司*国家).例如:

I need to run a regression on a panel data . It has 3 dimensions (Year * Company * Country). For example:

============================================
 year | comp | count |  value.x |  value.y
------+------+-------+----------+-----------
 2000 |   A  |  USA  |  1029.0  |  239481   
------+------+-------+----------+-----------
 2000 |   A  |  CAN  |  2341.4  |  129333   
------+------+-------+----------+-----------
 2000 |   B  |  USA  |  2847.7  |  187319   
------+------+-------+----------+-----------
 2000 |   B  |  CAN  |  4820.5  |  392039
------+------+-------+----------+-----------
 2001 |   A  |  USA  |  7289.9  |  429481
------+------+-------+----------+-----------
 2001 |   A  |  CAN  |  5067.3  |  589143
------+------+-------+----------+-----------
 2001 |   B  |  USA  |  7847.8  |  958234
------+------+-------+----------+-----------
 2001 |   B  |  CAN  |  9820.0  | 1029385
============================================

但是,R包plm似乎无法处理超过2维的尺寸.

However, the R package plm seems not able to cope with more than 2 dimension.

我尝试过

result <- plm(value.y ~ value.x, data = dataname, index = c("comp","count","year"))

并返回错误:

Error in pdata.frame(data, index) : 
'index' can be of length 2 at the most (one individual and one time index)

当面板数据(个体*时间)在个体"中具有多个维度时,如何运行回归?

How do you run regressions when the panel data (individual * time) has more than 1 dimension within "individual"?

万一有人遇到相同的情况,我将把解决方案放在这里:

In case anyone encounters the same situation, I'll put my solutions here:

R似乎无法应付这种情况.而您唯一可以做的就是添加虚拟对象.如果添加虚拟变量所依据的类别变量包含过多类别,则可以尝试以下操作:

R seems unable to cope with this situation. And the only thing you can do is to add dummies. If the categorical variables according to which you add dummies contains too much categories, you can try this:

makedummy <- function(colnum,data,interaction = FALSE,interation_varnum)
{
  char0 = colnames(data)[colnum]
  char1 = "dummy"
  tmp = unique(data[,colnum])
  valname = paste(char0,char1,tmp,sep = ".")
  valname_int = paste(char0,char1,"int",tmp,sep = ".")
  for(i in 1:(length(tmp)-1))
  {
    if(!interaction)
    {
      tmp_dummy <- ifelse(data[,colnum]==tmp[i],1,0)
    }
    if(interaction)
    {
      index = apply(as.matrix(data[,colnum]),1,identical,y = tmp[i])
      tmp_dummy = c()
      tmp_dummy[index] = data[index,interation_varnum]
      tmp_dummy[!index] = 0
    }
    tmp_dummy <- data.frame(tmp_dummy)
    if(!interaction)
    {
      colnames(tmp_dummy) <- valname[i]
    }
    if(interaction)
    {
      colnames(tmp_dummy) <- valname_int[i]
    }
    data<-cbind(data,tmp_dummy)
  }
  return(data)
}

例如:

## Create fake data
fakedata <- matrix(rnorm(300),nrow = 100)
cate <- LETTERS[sample(seq(1,10),100, replace = TRUE)]
fakedata <- cbind.data.frame(cate,fakedata)

## Try this
fakedata <- makedummy(1,fakedata)

## If you need to add dummy*x to see if there is any influences of different categories on the coefficients, try this
fakedata <- makedummy(1,fakedata,interaction = TRUE,interaction_varnum = 2)

在这里也许有些冗长,但我没有加以完善.任何建议都欢迎.现在,您可以对数据执行OLS.

Maybe a little bit verbose here, I didn't polish it. Any advice is welcome. Now you can perform OLS on your data.

推荐答案

这个问题很像这样:

  • fixed effects in R: plm vs lm + factor()
  • Fixed Effects plm package R - multiple observations per year/id

您可能不想创建新的虚拟对象,然后使用dplyr软件包可以使用group_indices函数.尽管它不支持mutate,但是以下方法很简单:

You may not want to create a new dummy, then with dplyr package you can use the group_indices function. Although it do not support mutate, the following approach is straightforward:

fakedata$id <- fakedata %>% group_indices(comp, count)

id变量将是您的第一个面板尺寸.因此,您需要将plm索引参数设置为index = c("id", "year").

The id variable will be your first panel dimension. So, you need to set the plm index argument to index = c("id", "year").

对于替代方案,您可以看一下以下问题: R create组中的ID .

For alternatives you can take a look at this question: R create ID within a group.

这篇关于如何在R中的多维面板数据上运行回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆