根据面板数据中的索引进行分块抽样 [英] Block sampling according to index in panel data

查看:50
本文介绍了根据面板数据中的索引进行分块抽样的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个面板数据,即每个 n 观察(nxt)的 t 行,例如

I have a panel data, i.e. t rows for each of n observations (nxt), such as

data("Grunfeld", package="plm")
head(Grunfeld)
firm year   inv  value capital
   1 1935 317.6 3078.5     2.8
   1 1936 391.8 4661.7    52.6
   1 1937 410.6 5387.1   156.9
   2 1935 257.7 2792.2   209.2
   2 1936 330.8 4313.2   203.4
   2 1937 461.2 4643.9   207.2

我想进行块自举,即我想用替换重新采样,并在观察到的所有年份中确定 [i].例如,如果 year=1935:1937 和公司 1 是随机抽取的,我希望公司 [1] 将在新样本中出现 3 次,对应于 year=1935:1937.如果重新绘制,则必须再次绘制 3 次.此外,我需要将我自己的函数应用于新的引导样本,并且我需要这样做 500 次.我目前的代码是这样的:

I want to make block bootstrapping, i.e. I want resample with replacement, taking a firm [i] with all the years in which it is observed. For instance, if year=1935:1937 and firm 1 is randomly drawn, I want that firm [1] will be in the new sample 3 times, corresponding to year=1935:1937. If it is re-drawn, then it must be again 3 for times. Furthermore, I need to apply my own function to the new bootstrapped sample and I need to do this 500 times. My current code is something like this:

library(boot)
boot.fun <- function(data) {
   est.boot = myfunction(y=Grunfeld$v1, x=Grunfeld$v2, other parameters)
   return(est.boot)
}
boot.sim <- function(data, mle) {
data =  sample(data, ?? ) #
return(data)
}

start.time = Sys.time()
result.boot <- boot(Grunfeld, myfunction( ... ), R=500, sim = "parametric",  
               ran.gen = boot.sim)
Sys.time() - start.time

我想通过以正确的方式指定重新采样 data = sample(data, ?? ) 因为它运行流畅干净,使用列作为索引 firm.我怎么能那样做?还有其他更有效的替代方法吗?

I was thinking to resample by specifying in a correct way data = sample(data, ?? ) as it works smooth and clean, using as index the column firm. How could I do that? Is there any other more efficient alternative?

编辑.我不一定需要一个新的 boot.function.我只需要一个(可能是快速的)代码,它允许使用替换重新采样,然后我将它作为 ran.gen=code.which.works 放在 boot 参数中.输出应该是与原始样本相同维度的样本,即使公司可以被随机选择两次或更多次(或不被选择).例如结果可能是

EDIT. I do not necessarily need a new boot.function. I just need a (possibly fast) code which allows to resample with replacement, then I ll put it inside the boot argument as ran.gen=code.which.works. The output should be a sample of the same dimension of the original, even though firms can be randomly picked twice or more (or not be picked). For instance the result could be

head(GrunfeldResampled)
firm year   inv  value capital
   2 1935 257.7 2792.2   209.2
   2 1936 330.8 4313.2   203.4
   2 1937 461.2 4643.9   207.2
   1 1935 317.6 3078.5    2.8
   1 1936 391.8 4661.7    52.6
   1 1937 410.6 5387.1   156.9
   2 1935 257.7 2792.2   209.2
   2 1936 330.8 4313.2   203.4
   2 1937 461.2 4643.9   207.2
   9 1935 317.6 3078.5   122.8
   9 1936 391.8 4661.7   342.6
   9 1937 410.6 5387.1   156.9

基本上我需要将每个公司视为一个 block,因此重采样应该适用于整个块.希望这能澄清

Basically I need each firm treated as a block, and therefore the resampling should apply to the whole block. Hope this clarifies

推荐答案

显然,在这个答案中,每家公司都被查看了整整 20 年,所以我不会有问题证明:

Apparently in this answer every firm is viewed for exactly 20 years, so I won't have a problem demonstrating:

data("Grunfeld", package="plm") #load data

解决方案

#n is the the firms column, df is the dataframe
myfunc <- function(n,df) {      #define function
 unique_firms <- unique(n)      #unique firms
 sample_firms <- sample(unique_firms, size=length(unique_firms), replace=T ) #choose from unique firms randomly with replacement
 new_df <- do.call(rbind, lapply(sample_firms, function(x)  df[df$firm==x,] ))  #fetch all years for each randomly picked firm and rbind
}

a <- myfunc(Grunfeld$firm, Grunfeld) #run function 

输出

> str(a)
'data.frame':   200 obs. of  5 variables:
 $ firm   : int  4 4 4 4 4 4 4 4 4 4 ...
 $ year   : int  1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 ...
 $ inv    : num  40.3 72.8 66.3 51.6 52.4 ...
 $ value  : num  418 838 884 438 680 ...
 $ capital: num  10.5 10.2 34.7 51.8 64.3 67.1 75.2 71.4 67.1 60.5 ...

正如你所看到的 dim 和输入的 data.frame

As you can see dim is exactly the same as the input data.frame

对于您的数据,解决方案是:

For your data the solution will be:

myfunc <- function(n,df) {      #define function
  unique_firms <- unique(n)      #unique firms
  print(unique_firms)
  sample_firms <- sample(unique_firms, size=length(unique_firms), replace=T ) #choose from unique firms randomly with replacement
  new_df <- do.call(rbind, lapply(sample_firms, function(x)  df[df$country==x,] ))  #fetch all years for each randomly picked firm and rbind
}

和输出:

> str(a)
'data.frame':   848 obs. of  18 variables:
 $ isocode  : Factor w/ 106 levels "AGO","ALB","ARG",..: 82 82 82 82 82 82 82 82 61 61 ...
 $ time     : int  2 3 4 5 6 7 8 9 2 3 ...
 $ country  : num  80 80 80 80 80 80 80 80 59 59 ...
 $ year     : int  1975 1980 1985 1990 1995 2000 2005 2010 1975 1980 ...
 $ gdp      : num  184619 210169 199343 268870 305255 ...
 $ pop      : num  33.4 34.9 36.6 37.8 38.3 ...
 $ gdp_k    : num  5526 6022 5443 7117 7969 ...
 $ co2      : num  340353 431436 426881 431052 350874 ...
 $ co2_k    : num  10191 12333 11674 11407 9128 ...
 $ oecd     : int  1 1 1 1 1 1 1 1 1 1 ...
 $ LI       : int  0 0 0 0 0 0 0 0 0 0 ...
 $ LMI      : int  0 0 0 0 0 0 0 0 0 0 ...
 $ UMI      : int  0 0 0 0 0 0 0 0 0 0 ...
 $ HI       : int  1 1 1 1 1 1 1 1 1 1 ...
 $ gdpk     : num  5531 6018 5449 7118 7971 ...
 $ co2k     : num  10196 12355 11668 11412 9162 ...
 $ co2_k.lag: num  8595 10191 12333 11674 11407 ...
 $ gdp_k.lag: num  4730 5526 6022 5443 7117 ...

这篇关于根据面板数据中的索引进行分块抽样的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆