等效于 R 中用于 Montecarlo 模拟的 Stata 命令`simulate` [英] Equivalent of Stata command `simulate` in R for Montecarlo Simulation
问题描述
我正在 R 中寻找极其方便的 Stata 命令 simulate
的等效函数.该命令基本上允许您声明一个 program
(在下面的示例中为 reg_simulation
),然后从 simulate
调用这样的程序并存储所需的输出.
I am searching for an equivalent function in R of the extremely convenient Stata command simulate
. The command basically allows you to declare a program
(reg_simulation
in the example below) and then invoke such a program from simulate
and store desired outputs.
下面是 simulate
程序使用的 Stata 说明,以及我尝试使用 R
复制它.
Below is a Stata illustration of the usage of the simulate
program, together with my attempt to replicate it using R
.
最后,我的主要问题是:这是 R 用户将如何运行 Montecarlo 模拟?还是我在结构或速度瓶颈方面遗漏了什么?在此先感谢您.
Finally, my main question is: is this how R users will run a Montecarlo simulation? or am I missing something in terms of structure or speed bottlenecks? Thank you a lot in advance.
- 定义
reg_simulation
程序.
clear all
*Define "reg_simulation" to be used later on by "simulate" command
program reg_simulation, rclass
*Declaring Stata version
version 13
*Droping all variables on memory
drop _all
*Set sample size (n=100)
set obs 100
*Simulate model
gen x1 = rnormal()
gen x2 = rnormal()
gen y = 1 + 0.5 * x1 + 1.5 *x2 + rnormal()
*Estimate OLS
reg y x1 x2
*Store coefficients
matrix B = e(b)
return matrix betas = B
end
- 从
simulate
命令调用reg_simulation
:
- Calling
reg_simulation
fromsimulate
command:
*Seet seed
set seed 1234
*Run the actual simulation 10 times using "reg_simulation"
simulate , reps(10) nodots: reg_simulation
- 获得的结果(存储在内存中的数据)
_b_x1 _b_x2 _b_cons
.4470155 1.50748 1.043514
.4235979 1.60144 1.048863
.5006762 1.362679 .8828927
.5319981 1.494726 1.103693
.4926634 1.476443 .8611253
.5920001 1.557737 .8391003
.5893909 1.384571 1.312495
.4721891 1.37305 1.017576
.7109139 1.47294 1.055216
.4197589 1.442816 .9404677
上面Stata程序的R复制.
使用 R 我设法获得了以下内容(不是 R 专家).然而,最让我担心的部分是 for 循环结构,它在每个重复次数 nreps
上循环.
- 定义
reg_simulation
函数.
#Defining a function
reg_simulation<- function(obs = 1000){
data <- data.frame(
#Generate data
x1 <-rnorm(obs, 0 , 1) ,
x2 <-rnorm(obs, 0 , 1) ,
y <- 1 + 0.5* x1 + 1.5 * x2 + rnorm(obs, 0 , 1) )
#Estimate OLS
ols <- lm(y ~ x1 + x2, data=data)
return(ols$coefficients)
}
- 使用 for 循环结构调用
reg_simulation
10 次:
- Calling
reg_simulation
10 times using a for-loop structure:
#Generate list to store results from simulation
results_list <- list()
# N repetitions
nreps <- 10
for (i in 1:nreps) {
#Set seed internally (to get different values in each run)
set.seed(i)
#Save results into list
results_list[i] <- list(reg_simulation(obs=1000))
}
#unlist results
df_results<- data.frame(t(sapply(results_list,
function(x) x[1:max(lengths(results_list))])))
- 获得的结果:
df_results
.
#final results
df_results
# X.Intercept. x1 x2
# 1 1.0162384 0.5490488 1.522017
# 2 1.0663263 0.4989537 1.496758
# 3 0.9862365 0.5144083 1.462388
# 4 1.0137042 0.4767466 1.551139
# 5 0.9996164 0.5020535 1.489724
# 6 1.0351182 0.4372447 1.444495
# 7 0.9975050 0.4809259 1.525741
# 8 1.0286192 0.5253288 1.491966
# 9 1.0107962 0.4659812 1.505793
# 10 0.9765663 0.5317318 1.501162
推荐答案
您走对了.一些提示/更正:
You're on the right track. Couple of hints/corrections:
- 不要在
data.frame()
中使用
<-
在 R 中,我们使用 =
构造数据框进行内部列赋值,即 data.frame(x = 1:10, y = 11:20)
而不是data.frame(x <- 1:10, y <- 11:20)
.
In R, we construct data frames using =
for internal column assignment, i.e. data.frame(x = 1:10, y = 11:20)
rather than data.frame(x <- 1:10, y <- 11:20)
.
(关于 <,还有 更多要说的内容;-
vs =
,但我不想分散你对主要问题的注意力.)
(There's more to be said about <-
vs =
, but I don't want to distract from your main question.)
在您的情况下,您实际上甚至不需要创建数据框,因为 x1
、x2
和 y
都将被识别作为全球"函数范围内的变量.我将在回答的末尾发布一些代码来证明这一点.
In your case, you don't actually even need to create a data frame since x1
, x2
and y
will all be recognized as "global" variables within the scope of the function. I'll post some code at the end of my answer demonstrating this.
- 在 R 中通过 for 循环增长列表时,始终尝试先预分配列表
- When growing a list via a for loop in R, always try to pre-allocate the list first
如果您要增加(长)for 循环,请始终尝试预先分配列表长度和类型.原因:这样,R 就知道可以有效地为您的对象分配多少内存.如果您只做 10 次重复,那意味着从以下内容开始:
Always try to pre-allocate the list length and type if you are going to grow a (long) for loop. Reason: That way, R knows how much memory to efficiently allocate to your object. In the case where you are only doing 10 reps, that would mean starting with something like:
results_list <- vector("list", 10)
3.考虑使用 lapply
而不是 for
3. Consider using lapply
instead of for
for 循环在 R 中有一些不好的表现.(有点不公平,但这是另一天的故事.)许多 R 用户会考虑的替代方案是 <提供的函数式编程方法代码>重叠代码>.我将暂缓向您展示代码,但它看起来与 for 循环非常相似.需要快速注意的是,从第 2 点开始,一个直接的好处是您不需要使用 lapply.
for loops have a bit of bad rep in R. (Somewhat unfairly, but that's a story for another day.) An alternative that many R users would consider is the functional programming approach offered by lapply
. I'll hold off on showing you the code for a second, but it will look very similar to a for loop. Just to note quickly, following on from point 2, that one immediate benefit is that you don't need to pre-allocate the list with lapply.
4.并行运行大循环
蒙特卡罗模拟是并行运行所有内容的理想候选者,因为每次迭代都应该独立于其他迭代.在 R 中实现并行的一种简单方法是通过 future.apply
包.
A Monte Carlo simulation is an ideal candidate for running everything in parallel, since each iteration is supposed to be independent of the others. An easy way to go parallel in R is via the future.apply
package.
将所有内容放在一起,这就是我可能会如何进行模拟.请注意,这可能更高级".比你可能需要的多,但既然我在这里...
Putting everything together, here's how I'd probably do your simulation. Note that this might be more "advanced" than you possibly need, but since I'm here...
library(data.table) ## optional, but what I'll use to coerce the list into a DT
library(future.apply) ## for parallel stuff
plan(multisession) ## use all available cores
obs <- 1e3
# Defining a function
reg_simulation <- function(...){
x1 <- rnorm(obs, 0 , 1)
x2 <- rnorm(obs, 0 , 1)
y <- 1 + 0.5* x1 + 1.5 * x2 + rnorm(obs, 0 , 1)
#Estimate OLS
ols <- lm(y ~ x1 + x2)
# return(ols$coefficients)
return(as.data.frame(t(ols$coefficients)))
}
# N repetitions
nreps <- 10
## Serial version
# results <- lapply(1:nreps, reg_simulation)
## Parallel version
results <- future_lapply(1:nreps, reg_simulation, future.seed = 1234L)
## Unlist / convert into a data.table
results <- rbindlist(results)
这篇关于等效于 R 中用于 Montecarlo 模拟的 Stata 命令`simulate`的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!