将R的种子从Rcpp更改为保证可重复性 [英] Changing R's Seed from Rcpp to Guarantee Reproducibility
问题描述
我正在尝试在rcpp中编写函数r(d,n)。函数从正态分布N(0,d)返回n次随机抽奖。此函数应定义得很好,因此,只要d和n不改变其值,该函数应返回相同的绘制。
I am trying to write a function r(d, n) in rcpp. The function returns n random draws from normal distribution N(0, d). This function should be well defined, therefore the function should return the same draws whenever the d and n do not change their value.
如果d限制为整数,这将不是问题,在这种情况下,我可以设置种子并完成工作
This won't be a problem if d is restricted to be integer, in which case I can set seed and do the job
// set seed
// [[Rcpp::export]]
void set_seed(unsigned int seed) {
Rcpp::Environment base_env("package:base");
Rcpp::Function set_seed_r = base_env["set.seed"];
set_seed_r(seed);
}
// function r(d, n)
// [[Rcpp::export]]
vec randdraw(int d, int n){
set_seed(d);
vec out = randn(n);
return out;
}
但显然我不想将d限制为整数。理想情况下,d应该是两倍。有什么想法吗?谢谢!
But clearly I don't want to restrict d to be integer. Ideally d should be double. Any thoughts? Thank you!
推荐答案
我认为正在发生的问题是您试图分散 randn 由 Armadillo 提供,仅限于标准法线,例如N(0,1),使其与N(0,d)匹配。有两种方法可以解决此问题,因为它是标准的法线。
The issue that I think is happening is you are trying to disperse the randn
offered by Armadillo that is restricted to being a standard normal, e.g. N(0,1), such that it matches N(0, d). There are two ways to go about this since it is a standard normal.
第一种方法是将样本乘以 d
,例如 sqrt(d)* sample
。由于方差和期望的随机变量属性使之成为sqrt(d)* N(0,1)〜N(0,sqrt(d)^ 2)〜N(0,d),因此这是可能的。
The first way involves just multiplying the sample by the square root of d
, e.g. sqrt(d)*sample
. This is possible due to the random variable properties of variance and expectation giving sqrt(d)*N(0, 1) ~ N(0, sqrt(d)^2) ~ N(0, d).
这里要注意的更重要的事情之一是 set_seed()
函数自Armadillo 配置 RcppArmadillo 挂接到 R 的RNG库,以访问 :: Rf_runif
函数以生成随机值。唯一值得关注的领域是,由于
One of the more important things to note here is that the set_seed()
function will work since the Armadillo configuration of RcppArmadillo hooks into R's RNG library to access the ::Rf_runif
function to generate random values. The only area of concern is you cannot use arma::arma_rng::set_seed()
to set the seed due to limitations of the R/C++ interaction detailed in Section 6.3 of Writing R Extensions. If you do use this, then you would get warned with :
从R调用时,必须通过set在R级别设置RNG种子。 seed()
When called from R, the RNG seed has to be set at the R level via set.seed()
在检测到的第一个呼叫上。
on the first detected call.
说,这是一个简短的代码示例,其中我们乘以 sqrt(d)
。
With this being said, here is a short code example where we multiple by sqrt(d)
.
代码:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// set seed
// [[Rcpp::export]]
void set_seed(double seed) {
Rcpp::Environment base_env("package:base");
Rcpp::Function set_seed_r = base_env["set.seed"];
set_seed_r(std::floor(std::fabs(seed)));
}
// function r(d, n)
// [[Rcpp::export]]
arma::vec randdraw(double d, int n){
set_seed(d); // Set a seed for R's RNG library
// Call Armadillo's RNG procedure that references R's RNG capabilities
// and change dispersion slightly.
arma::vec out = std::sqrt(std::fabs(d))*arma::randn(n);
return out;
}
输出:
> randdraw(3.5, 5L)
[,1]
[1,] -0.8671559
[2,] -1.9507540
[3,] 2.9025090
[4,] -1.2953745
[5,] 2.0799176
注意:与 rnorm
过程不同于 arma :: randn
世代。
Note: There is no direct equivalent as the rnorm
procedure differs from the arma::randn
generation.
第二种也是更好的解决方案是显式依赖 R 的RNG功能。以前,由于 RcppArmadillo 的配置,我们对 R 的RNG库进行了隐式的使用。我倾向于使用这种方法,因为您已经假设使用 set_seed()
函数(免责声明:我写了这篇文章强>)。如果您担心 d
的限制是整数
,那么
到 int
可以与 std :: floor(std :: fabs(seed))
。使用 Rcpp :: r *()
或 R :: r *()
生成值后,犰狳矢量是使用高级ctor 创建的,它可以重用现有的内存分配。
The second, and significantly better solution, is to explicitly rely upon R's RNG functions. Previously, we made an implicit use of R's RNG library due to RcppArmadillo's configuration. I tend to prefer this approach as you have already made an assumption that the code is specific to R when using the set_seed()
function (Disclaimer: I wrote the post). If you are worried about the restriction of d
being an integer
, a slight coercion from double
to int
is possible with std::floor(std::fabs(seed))
. Once the values are generated using either Rcpp::r*()
or R::r*()
, an armadillo vector is created using an advanced ctor that reuses the existing memory allocation.
代码:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// set seed
// [[Rcpp::export]]
void set_seed(double seed) {
Rcpp::Environment base_env("package:base");
Rcpp::Function set_seed_r = base_env["set.seed"];
set_seed_r(std::floor(std::fabs(seed)));
}
// function r(d, n)
// [[Rcpp::export]]
arma::vec randdraw(double d, int n){
set_seed(d); // Set a seed for R's RNG library
Rcpp::NumericVector draws = Rcpp::rnorm(n, 0.0, d); // Hook into R's Library
// Use Armadillo's advanced CTOR to re-use memory and cast as an armadillo object.
arma::vec out = arma::vec(draws.begin(), n, false, true);
return out;
}
输出:
> randdraw(3.21,10)
[,1]
[1,] -3.08780627
[2,] -0.93900757
[3,] 0.83071017
[4,] -3.69834335
[5,] 0.62846287
[6,] 0.09669786
[7,] 0.27419092
[8,] 3.58431878
[9,] -3.91253230
[10,] 4.06825360
> set.seed(3)
> rnorm(10, 0, 3.21)
[1] -3.08780627 -0.93900757 0.83071017 -3.69834335 0.62846287 0.09669786 0.27419092 3.58431878 -3.91253230 4.06825360
这篇关于将R的种子从Rcpp更改为保证可重复性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!