将R的种子从Rcpp更改为保证可重复性 [英] Changing R's Seed from Rcpp to Guarantee Reproducibility

查看:140
本文介绍了将R的种子从Rcpp更改为保证可重复性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在rcpp中编写函数r(d,n)。函数从正态分布N(0,d)返回n次随机抽奖。此函数应定义得很好,因此,只要d和n不改变其值,该函数应返回相同的绘制。

I am trying to write a function r(d, n) in rcpp. The function returns n random draws from normal distribution N(0, d). This function should be well defined, therefore the function should return the same draws whenever the d and n do not change their value.

如果d限制为整数,这将不是问题,在这种情况下,我可以设置种子并完成工作

This won't be a problem if d is restricted to be integer, in which case I can set seed and do the job

// set seed
// [[Rcpp::export]]
void set_seed(unsigned int seed) {
  Rcpp::Environment base_env("package:base");
  Rcpp::Function set_seed_r = base_env["set.seed"];
  set_seed_r(seed);  
}

// function r(d, n)
// [[Rcpp::export]]
vec randdraw(int d, int n){
  set_seed(d);
  vec out = randn(n);
  return out;
}

但显然我不想将d限制为整数。理想情况下,d应该是两倍。有什么想法吗?谢谢!

But clearly I don't want to restrict d to be integer. Ideally d should be double. Any thoughts? Thank you!

推荐答案

我认为正在发生的问题是您试图分散 randn 由 Armadillo 提供,仅限于标准法线,例如N(0,1),使其与N(0,d)匹配。有两种方法可以解决此问题,因为它是标准的法线。

The issue that I think is happening is you are trying to disperse the randn offered by Armadillo that is restricted to being a standard normal, e.g. N(0,1), such that it matches N(0, d). There are two ways to go about this since it is a standard normal.

第一种方法是将样本乘以 d ,例如 sqrt(d)* sample 。由于方差和期望的随机变量属性使之成为sqrt(d)* N(0,1)〜N(0,sqrt(d)^ 2)〜N(0,d),因此这是可能的。

The first way involves just multiplying the sample by the square root of d, e.g. sqrt(d)*sample. This is possible due to the random variable properties of variance and expectation giving sqrt(d)*N(0, 1) ~ N(0, sqrt(d)^2) ~ N(0, d).

这里要注意的更重要的事情之一是 set_seed()函数自Armadillo 配置 RcppArmadillo 挂接到 R 的RNG库,以访问 :: Rf_runif 函数以生成随机值。唯一值得关注的领域是,由于 arma :: arma_rng :: set_seed()来设置种子。 http://cran.r-project.org/doc/manuals/r-devel/R-exts.html#Random-numbers rel = nofollow noreferrer>编写R扩展的6.3节。如果您确实使用此功能,则会被警告

One of the more important things to note here is that the set_seed() function will work since the Armadillo configuration of RcppArmadillo hooks into R's RNG library to access the ::Rf_runif function to generate random values. The only area of concern is you cannot use arma::arma_rng::set_seed() to set the seed due to limitations of the R/C++ interaction detailed in Section 6.3 of Writing R Extensions. If you do use this, then you would get warned with :


从R调用时,必须通过set在R级别设置RNG种子。 seed()

When called from R, the RNG seed has to be set at the R level via set.seed()

在检测到的第一个呼叫上。

on the first detected call.

说,这是一个简短的代码示例,其中我们乘以 sqrt(d)

With this being said, here is a short code example where we multiple by sqrt(d).

代码:

#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]

// set seed
// [[Rcpp::export]]
void set_seed(double seed) {
    Rcpp::Environment base_env("package:base");
    Rcpp::Function set_seed_r = base_env["set.seed"];
    set_seed_r(std::floor(std::fabs(seed)));
}

// function r(d, n)
// [[Rcpp::export]]
arma::vec randdraw(double d, int n){
    set_seed(d);              // Set a seed for R's RNG library
    // Call Armadillo's RNG procedure that references R's RNG capabilities
    // and change dispersion slightly.
    arma::vec out = std::sqrt(std::fabs(d))*arma::randn(n);
    return out;
}

输出:

> randdraw(3.5, 5L)
           [,1]
[1,] -0.8671559
[2,] -1.9507540
[3,]  2.9025090
[4,] -1.2953745
[5,]  2.0799176

注意:与 rnorm 过程不同于 arma :: randn 世代。

Note: There is no direct equivalent as the rnorm procedure differs from the arma::randn generation.

第二种也是更好的解决方案是显式依赖 R 的RNG功能。以前,由于 RcppArmadillo 的配置,我们对 R 的RNG库进行了隐式的使用。我倾向于使用这种方法,因为您已经假设使用 set_seed()函数免责声明:我写了这篇文章)。如果您担心 d 的限制是整数,那么 int 可以与 std :: floor(std :: fabs(seed))。使用 Rcpp :: r *() R :: r *()生成值后,犰狳矢量是使用高级ctor 创建的,它可以重用现有的内存分配。

The second, and significantly better solution, is to explicitly rely upon R's RNG functions. Previously, we made an implicit use of R's RNG library due to RcppArmadillo's configuration. I tend to prefer this approach as you have already made an assumption that the code is specific to R when using the set_seed() function (Disclaimer: I wrote the post). If you are worried about the restriction of d being an integer, a slight coercion from double to int is possible with std::floor(std::fabs(seed)). Once the values are generated using either Rcpp::r*() or R::r*() , an armadillo vector is created using an advanced ctor that reuses the existing memory allocation.

代码:

#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]

// set seed
// [[Rcpp::export]]
void set_seed(double seed) {
    Rcpp::Environment base_env("package:base");
    Rcpp::Function set_seed_r = base_env["set.seed"];
    set_seed_r(std::floor(std::fabs(seed)));
}

// function r(d, n)
// [[Rcpp::export]]
arma::vec randdraw(double d, int n){
    set_seed(d);                                      // Set a seed for R's RNG library
    Rcpp::NumericVector draws = Rcpp::rnorm(n, 0.0, d); // Hook into R's Library
    // Use Armadillo's advanced CTOR to re-use memory and cast as an armadillo object.
    arma::vec out = arma::vec(draws.begin(), n, false, true);
    return out;
}

输出:

> randdraw(3.21,10)
             [,1]
 [1,] -3.08780627
 [2,] -0.93900757
 [3,]  0.83071017
 [4,] -3.69834335
 [5,]  0.62846287
 [6,]  0.09669786
 [7,]  0.27419092
 [8,]  3.58431878
 [9,] -3.91253230
[10,]  4.06825360
> set.seed(3)
> rnorm(10, 0, 3.21)
 [1] -3.08780627 -0.93900757  0.83071017 -3.69834335  0.62846287  0.09669786  0.27419092  3.58431878 -3.91253230  4.06825360

这篇关于将R的种子从Rcpp更改为保证可重复性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆