等效于 R 中用于 Montecarlo 模拟的 Stata 命令`simulate` [英] Equivalent of Stata command `simulate` in R for Montecarlo Simulation

查看:45
本文介绍了等效于 R 中用于 Montecarlo 模拟的 Stata 命令`simulate`的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在 R 中寻找极其方便的 Stata 命令 simulate 的等效函数.该命令基本上允许您声明一个 program(在下面的示例中为 reg_simulation),然后从 simulate 调用这样的程序并存储所需的输出.

I am searching for an equivalent function in R of the extremely convenient Stata command simulate. The command basically allows you to declare a program (reg_simulation in the example below) and then invoke such a program from simulate and store desired outputs.

下面是 simulate 程序使用的 Stata 说明,以及我尝试使用 R 复制它.

Below is a Stata illustration of the usage of the simulate program, together with my attempt to replicate it using R.

最后,我的主要问题是:这是 R 用户将如何运行 Montecarlo 模拟?还是我在结构或速度瓶颈方面遗漏了什么?在此先感谢您.

Finally, my main question is: is this how R users will run a Montecarlo simulation? or am I missing something in terms of structure or speed bottlenecks? Thank you a lot in advance.

  1. 定义reg_simulation 程序.

clear all
*Define "reg_simulation" to be used later on by "simulate" command 
program reg_simulation, rclass
    *Declaring Stata version
    version 13
    *Droping all variables on memory
    drop _all
    *Set sample size (n=100)
    set obs 100
    *Simulate model
    gen x1 = rnormal()
    gen x2 = rnormal()
    gen y = 1 + 0.5 * x1 + 1.5 *x2 + rnormal()
    *Estimate OLS
    reg y x1 x2 
    *Store coefficients
    matrix B = e(b)
    return matrix betas = B 
end

  1. simulate 命令调用 reg_simulation:
  1. Calling reg_simulation from simulate command:

*Seet seed
set seed 1234
*Run the actual simulation 10 times using "reg_simulation"
simulate , reps(10) nodots: reg_simulation

  1. 获得的结果(存储在内存中的数据)

_b_x1   _b_x2   _b_cons
.4470155    1.50748     1.043514
.4235979    1.60144     1.048863
.5006762    1.362679    .8828927
.5319981    1.494726    1.103693
.4926634    1.476443    .8611253
.5920001    1.557737    .8391003
.5893909    1.384571    1.312495
.4721891    1.37305     1.017576
.7109139    1.47294     1.055216
.4197589    1.442816    .9404677

上面Stata程序的R复制.

使用 R 我设法获得了以下内容(不是 R 专家).然而,最让我担心的部分是 for 循环结构,它在每个重复次数 nreps 上循环.

  1. 定义reg_simulation 函数.

#Defining a function 
reg_simulation<- function(obs = 1000){
    data <- data.frame(
    #Generate data
    x1 <-rnorm(obs, 0 , 1) ,
    x2 <-rnorm(obs, 0 , 1) ,
    y <- 1 + 0.5* x1 + 1.5 * x2 + rnorm(obs, 0 , 1) )
  #Estimate OLS
  ols <- lm(y ~ x1 + x2, data=data)  
  return(ols$coefficients)  
}

  1. 使用 for 循环结构调用 reg_simulation 10 次:
  1. Calling reg_simulation 10 times using a for-loop structure:

#Generate list to store results from simulation
results_list <- list()
# N repetitions
nreps <- 10
for (i in 1:nreps) {
  #Set seed internally (to get different values in each run)
  set.seed(i)
  #Save results into list
  results_list[i]  <- list(reg_simulation(obs=1000))  
}
#unlist results
df_results<- data.frame(t(sapply(results_list, 
                       function(x) x[1:max(lengths(results_list))])))

  1. 获得的结果:df_results.

#final results
df_results
#   X.Intercept.  x1        x2
# 1     1.0162384 0.5490488 1.522017
# 2     1.0663263 0.4989537 1.496758
# 3     0.9862365 0.5144083 1.462388
# 4     1.0137042 0.4767466 1.551139
# 5     0.9996164 0.5020535 1.489724
# 6     1.0351182 0.4372447 1.444495
# 7     0.9975050 0.4809259 1.525741
# 8     1.0286192 0.5253288 1.491966
# 9     1.0107962 0.4659812 1.505793
# 10    0.9765663 0.5317318 1.501162

推荐答案

您走对了.一些提示/更正:

You're on the right track. Couple of hints/corrections:

  1. 不要在data.frame()
  2. 中使用<-

在 R 中,我们使用 = 构造数据框进行内部列赋值,即 data.frame(x = 1:10, y = 11:20) 而不是data.frame(x <- 1:10, y <- 11:20).

In R, we construct data frames using = for internal column assignment, i.e. data.frame(x = 1:10, y = 11:20) rather than data.frame(x <- 1:10, y <- 11:20).

(关于 <,还有 更多要说的内容;- vs =,但我不想分散你对主要问题的注意力.)

(There's more to be said about <- vs =, but I don't want to distract from your main question.)

在您的情况下,您实际上甚至不需要创建数据框,因为 x1x2y 都将被识别作为全球"函数范围内的变量.我将在回答的末尾发布一些代码来证明这一点.

In your case, you don't actually even need to create a data frame since x1, x2 and y will all be recognized as "global" variables within the scope of the function. I'll post some code at the end of my answer demonstrating this.

  1. 在 R 中通过 for 循环增长列表时,始终尝试先预分配列表
  1. When growing a list via a for loop in R, always try to pre-allocate the list first

如果您要增加(长)for 循环,请始终尝试预先分配列表长度和类型.原因:这样,R 就知道可以有效地为您的对象分配多少内存.如果您只做 10 次重复,那意味着从以下内容开始:

Always try to pre-allocate the list length and type if you are going to grow a (long) for loop. Reason: That way, R knows how much memory to efficiently allocate to your object. In the case where you are only doing 10 reps, that would mean starting with something like:

results_list <- vector("list", 10)

3.考虑使用 lapply 而不是 for

3. Consider using lapply instead of for

for 循环在 R 中有一些不好的表现.(有点不公平,但这是另一天的故事.)许多 R 用户会考虑的替代方案是 <提供的函数式编程方法代码>重叠.我将暂缓向您展示代码,但它看起来与 for 循环非常相似.需要快速注意的是,从第 2 点开始,一个直接的好处是您不需要使用 lapply.

for loops have a bit of bad rep in R. (Somewhat unfairly, but that's a story for another day.) An alternative that many R users would consider is the functional programming approach offered by lapply. I'll hold off on showing you the code for a second, but it will look very similar to a for loop. Just to note quickly, following on from point 2, that one immediate benefit is that you don't need to pre-allocate the list with lapply.

4.并行运行大循环

蒙特卡罗模拟是并行运行所有内容的理想候选者,因为每次迭代都应该独立于其他迭代.在 R 中实现并行的一种简单方法是通过 future.apply 包.

A Monte Carlo simulation is an ideal candidate for running everything in parallel, since each iteration is supposed to be independent of the others. An easy way to go parallel in R is via the future.apply package.

将所有内容放在一起,这就是我可能会如何进行模拟.请注意,这可能更高级".比你可能需要的多,但既然我在这里...

Putting everything together, here's how I'd probably do your simulation. Note that this might be more "advanced" than you possibly need, but since I'm here...

library(data.table)   ## optional, but what I'll use to coerce the list into a DT
library(future.apply) ## for parallel stuff
plan(multisession)    ## use all available cores

obs <- 1e3

# Defining a function 
reg_simulation <- function(...){
    x1 <- rnorm(obs, 0 , 1)
    x2 <- rnorm(obs, 0 , 1)
    y <- 1 + 0.5* x1 + 1.5 * x2 + rnorm(obs, 0 , 1)
    #Estimate OLS
    ols <- lm(y ~ x1 + x2)  
    
    # return(ols$coefficients)
    return(as.data.frame(t(ols$coefficients)))
}

# N repetitions
nreps <- 10

## Serial version
# results  <- lapply(1:nreps, reg_simulation)

## Parallel version
results  <- future_lapply(1:nreps, reg_simulation, future.seed = 1234L)

## Unlist / convert into a data.table
results <- rbindlist(results)

这篇关于等效于 R 中用于 Montecarlo 模拟的 Stata 命令`simulate`的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆