R中的数据转换以进行面板回归 [英] Data Transformation in R for Panel Regression

查看:244
本文介绍了R中的数据转换以进行面板回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我真的需要您的帮助来解决可能似乎很容易解决的问题.

目前,我正在从事一个涉及面板回归的项目.我有几个大的csv文件(每张纸最多有1200万个条目),其格式如所附图片所示,而列(V1,V2)是个人,行(1、2、3)是时间标识符./p>

为了使用plm()功能,我需要将所有这些文件转换为以下数据结构:

ID Time X1 X2
1 1 x1 x2
1 2 x1 x2
1 ... ... ...
2 1 x1 x2
2 2 ... ...

我真的为这种转换而苦恼,现在我真的很沮丧,即我从哪里得到标识符和时间索引? 如果您能向我提供有关如何解决此问题的信息,我们将不胜感激.

如果您不清楚我的问题,请问.

最好的问候和预先的感谢

输出应如下所示:

解决方案

 mydata<-structure(list(V1 = 10:13, V2 = 21:24, V3 = c(31L, 32L, 3L, 34L
    )), .Names = c("V1", "V2", "V3"), class = "data.frame", row.names = c(NA, 
    -4L))

> mydata
  V1 V2 V3
1 10 21 31
2 11 22 32
3 12 23  3
4 13 24 34

以下代码可用于您的数据,而无需进行任何更改.为了说明起见,我仅使用了以上数据.我使用了基本的R reshape函数

long <- reshape(mydata, idvar = "time", ids = row.names(mydata),
                times = names(mydata), timevar = "id",
                varying = list(names(mydata)),v.names="value", new.row.names = 1:((dim(mydata)[2])*(dim(mydata)[1])),direction = "long")

> long
   id value time
1  V1    10    1
2  V1    11    2
3  V1    12    3
4  V1    13    4
5  V2    21    1
6  V2    22    2
7  V2    23    3
8  V2    24    4
9  V3    31    1
10 V3    32    2
11 V3     3    3
12 V3    34    4
long$id<-substr(long$id,2,4) # 4 is used to take into account your 416 variables
myout<-long[,c(1,3,2)]
> myout
   id time value
1   1    1    10
2   1    2    11
3   1    3    12
4   1    4    13
5   2    1    21
6   2    2    22
7   2    3    23
8   2    4    24
9   3    1    31
10  3    2    32
11  3    3     3
12  3    4    34

I really need your help regarding a problem which may seem easy to solve for you.

Currently I work on a project which involves some panel-regressions. I have several large csv-files (up to 12 million entries per sheet) which are formatted as in the picture attached, whereas the columns (V1, V2) are individuals and the rows (1, 2, 3) are time identifiers.

In order to use the plm()-function I need all these files to convert to the following data structure:

ID Time X1 X2
1 1 x1 x2
1 2 x1 x2
1 ... ... ...
2 1 x1 x2
2 2 ... ...

I really struggle with this transformation and I'm really frustrated right now i.e. where do I get the identifier and the time index from? Would really appreciate if you could provide me with information how to solve this problem.

If my question is not clear to you, just ask.

Best regards and thanks in advance

The output should look like as follows:

解决方案

 mydata<-structure(list(V1 = 10:13, V2 = 21:24, V3 = c(31L, 32L, 3L, 34L
    )), .Names = c("V1", "V2", "V3"), class = "data.frame", row.names = c(NA, 
    -4L))

> mydata
  V1 V2 V3
1 10 21 31
2 11 22 32
3 12 23  3
4 13 24 34

The following code can be used for your data without changing anything. For illustration, I used just the above data. I used the base R reshape function

long <- reshape(mydata, idvar = "time", ids = row.names(mydata),
                times = names(mydata), timevar = "id",
                varying = list(names(mydata)),v.names="value", new.row.names = 1:((dim(mydata)[2])*(dim(mydata)[1])),direction = "long")

> long
   id value time
1  V1    10    1
2  V1    11    2
3  V1    12    3
4  V1    13    4
5  V2    21    1
6  V2    22    2
7  V2    23    3
8  V2    24    4
9  V3    31    1
10 V3    32    2
11 V3     3    3
12 V3    34    4
long$id<-substr(long$id,2,4) # 4 is used to take into account your 416 variables
myout<-long[,c(1,3,2)]
> myout
   id time value
1   1    1    10
2   1    2    11
3   1    3    12
4   1    4    13
5   2    1    21
6   2    2    22
7   2    3    23
8   2    4    24
9   3    1    31
10  3    2    32
11  3    3     3
12  3    4    34

这篇关于R中的数据转换以进行面板回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆