根据面板数据中的索引进行分块抽样 [英] Block sampling according to index in panel data
问题描述
我有一个面板数据,即每个 n
观察(nxt
)的 t
行,例如
I have a panel data, i.e. t
rows for each of n
observations (nxt
), such as
data("Grunfeld", package="plm")
head(Grunfeld)
firm year inv value capital
1 1935 317.6 3078.5 2.8
1 1936 391.8 4661.7 52.6
1 1937 410.6 5387.1 156.9
2 1935 257.7 2792.2 209.2
2 1936 330.8 4313.2 203.4
2 1937 461.2 4643.9 207.2
我想进行块自举,即我想用替换重新采样,并在观察到的所有年份中确定 [i].例如,如果 year=1935:1937
和公司 1 是随机抽取的,我希望公司 [1] 将在新样本中出现 3 次,对应于 year=1935:1937代码>.如果重新绘制,则必须再次绘制 3 次.此外,我需要将我自己的函数应用于新的引导样本,并且我需要这样做 500 次.我目前的代码是这样的:
I want to make block bootstrapping, i.e. I want resample with replacement, taking a firm [i] with all the years in which it is observed. For instance, if year=1935:1937
and firm 1 is randomly drawn, I want that firm [1] will be in the new sample 3 times, corresponding to year=1935:1937
. If it is re-drawn, then it must be again 3 for times. Furthermore, I need to apply my own function to the new bootstrapped sample and I need to do this 500 times.
My current code is something like this:
library(boot)
boot.fun <- function(data) {
est.boot = myfunction(y=Grunfeld$v1, x=Grunfeld$v2, other parameters)
return(est.boot)
}
boot.sim <- function(data, mle) {
data = sample(data, ?? ) #
return(data)
}
start.time = Sys.time()
result.boot <- boot(Grunfeld, myfunction( ... ), R=500, sim = "parametric",
ran.gen = boot.sim)
Sys.time() - start.time
我想通过以正确的方式指定重新采样 data = sample(data, ?? )
因为它运行流畅干净,使用列作为索引 firm
.我怎么能那样做?还有其他更有效的替代方法吗?
I was thinking to resample by specifying in a correct way data = sample(data, ?? )
as it works smooth and clean, using as index the column firm
. How could I do that? Is there any other more efficient alternative?
编辑.我不一定需要一个新的 boot.function.我只需要一个(可能是快速的)代码,它允许使用替换重新采样,然后我将它作为 ran.gen=code.which.works
放在 boot
参数中.输出应该是与原始样本相同维度的样本,即使公司可以被随机选择两次或更多次(或不被选择).例如结果可能是
EDIT.
I do not necessarily need a new boot.function. I just need a (possibly fast) code which allows to resample with replacement, then I ll put it inside the boot
argument as ran.gen=code.which.works
.
The output should be a sample of the same dimension of the original, even though firms can be randomly picked twice or more (or not be picked). For instance the result could be
head(GrunfeldResampled)
firm year inv value capital
2 1935 257.7 2792.2 209.2
2 1936 330.8 4313.2 203.4
2 1937 461.2 4643.9 207.2
1 1935 317.6 3078.5 2.8
1 1936 391.8 4661.7 52.6
1 1937 410.6 5387.1 156.9
2 1935 257.7 2792.2 209.2
2 1936 330.8 4313.2 203.4
2 1937 461.2 4643.9 207.2
9 1935 317.6 3078.5 122.8
9 1936 391.8 4661.7 342.6
9 1937 410.6 5387.1 156.9
基本上我需要将每个公司视为一个 block
,因此重采样应该适用于整个块.希望这能澄清
Basically I need each firm treated as a block
, and therefore the resampling should apply to the whole block. Hope this clarifies
推荐答案
显然,在这个答案中,每家公司都被查看了整整 20 年,所以我不会有问题证明:
Apparently in this answer every firm is viewed for exactly 20 years, so I won't have a problem demonstrating:
data("Grunfeld", package="plm") #load data
解决方案
#n is the the firms column, df is the dataframe
myfunc <- function(n,df) { #define function
unique_firms <- unique(n) #unique firms
sample_firms <- sample(unique_firms, size=length(unique_firms), replace=T ) #choose from unique firms randomly with replacement
new_df <- do.call(rbind, lapply(sample_firms, function(x) df[df$firm==x,] )) #fetch all years for each randomly picked firm and rbind
}
a <- myfunc(Grunfeld$firm, Grunfeld) #run function
输出
> str(a)
'data.frame': 200 obs. of 5 variables:
$ firm : int 4 4 4 4 4 4 4 4 4 4 ...
$ year : int 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 ...
$ inv : num 40.3 72.8 66.3 51.6 52.4 ...
$ value : num 418 838 884 438 680 ...
$ capital: num 10.5 10.2 34.7 51.8 64.3 67.1 75.2 71.4 67.1 60.5 ...
正如你所看到的 dim
和输入的 data.frame
As you can see dim
is exactly the same as the input data.frame
对于您的数据,解决方案是:
For your data the solution will be:
myfunc <- function(n,df) { #define function
unique_firms <- unique(n) #unique firms
print(unique_firms)
sample_firms <- sample(unique_firms, size=length(unique_firms), replace=T ) #choose from unique firms randomly with replacement
new_df <- do.call(rbind, lapply(sample_firms, function(x) df[df$country==x,] )) #fetch all years for each randomly picked firm and rbind
}
和输出:
> str(a)
'data.frame': 848 obs. of 18 variables:
$ isocode : Factor w/ 106 levels "AGO","ALB","ARG",..: 82 82 82 82 82 82 82 82 61 61 ...
$ time : int 2 3 4 5 6 7 8 9 2 3 ...
$ country : num 80 80 80 80 80 80 80 80 59 59 ...
$ year : int 1975 1980 1985 1990 1995 2000 2005 2010 1975 1980 ...
$ gdp : num 184619 210169 199343 268870 305255 ...
$ pop : num 33.4 34.9 36.6 37.8 38.3 ...
$ gdp_k : num 5526 6022 5443 7117 7969 ...
$ co2 : num 340353 431436 426881 431052 350874 ...
$ co2_k : num 10191 12333 11674 11407 9128 ...
$ oecd : int 1 1 1 1 1 1 1 1 1 1 ...
$ LI : int 0 0 0 0 0 0 0 0 0 0 ...
$ LMI : int 0 0 0 0 0 0 0 0 0 0 ...
$ UMI : int 0 0 0 0 0 0 0 0 0 0 ...
$ HI : int 1 1 1 1 1 1 1 1 1 1 ...
$ gdpk : num 5531 6018 5449 7118 7971 ...
$ co2k : num 10196 12355 11668 11412 9162 ...
$ co2_k.lag: num 8595 10191 12333 11674 11407 ...
$ gdp_k.lag: num 4730 5526 6022 5443 7117 ...
这篇关于根据面板数据中的索引进行分块抽样的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!