Rcpp 中的 C++ 内置随机工件 [英] C++ builtin random artifacts in Rcpp

查看:56
本文介绍了Rcpp 中的 C++ 内置随机工件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在维护一个名为 iRF 的 R 包,一个大问题是它不是t 可重现.换句话说,我无法通过设置 set.seed 获得相同的结果.出于这个问题的目的,让我们关注函数 RIT.你不需要弄清楚它是做什么的;只需查看 RNG 处理部分即可.

I'm maintaining an R package named iRF, and a big problem is that it isn't reproducible. In other words, I cannot get the same result by setting set.seed. For the purpose of this question, let's focus on the function RIT. You don't need to figure out what it does; just look at the RNG handling part instead.

它定义在R/RIT.R,根据输入类型调用 RIT_1classRIT_2class.RIT_[1|2]class 函数都定义在 src/ExportedFunctionsRIT.cpp,它依次调用 src/RITmain.hsrc/RITaux.h.

It is defined in R/RIT.R, which calls either RIT_1class or RIT_2class depending on the input type. Both RIT_[1|2]class functions are defined in src/ExportedFunctionsRIT.cpp, which in turn calls helper functions defined in src/RITmain.h and src/RITaux.h.

我正在使用 Rcpp 属性,因此 RIT_[1|2]class 中的随机性应该由隐式 RNGScope 正确处理,如在这个答案中.然而,这个代码库在两个方面处理起来很棘手,

I'm using Rcpp attributes, so randomness in RIT_[1|2]class should be correctly handled by an implicit RNGScope, as mentioned in this answer. However, this codebase is tricky to tackle in two ways,

  1. 函数RIT_basicRIT_minhasha> 使用 //[[Rcpp::plugins(openmp)]].幸运的是,原作者给了每个线程一个单独的种子,所以希望我可以用 seeds[i] = rand() * (i+1) 使它具有确定性,但是你可以说这不是不够,因为我在这里问.
  1. The functions RIT_basic and RIT_minhash use // [[Rcpp::plugins(openmp)]]. Fortunately, the original author gives each thread a separate seed, so hopefully, I can make it deterministic with seeds[i] = rand() * (i+1), yet you can tell this along isn't enough since I'm asking here.

// Set up vector of seeds for RNG
vector<unsigned int> seeds(n_cores);
for (int i=0; i<n_cores; i++) {
  seeds[i] = chrono::high_resolution_clock::now().time_since_epoch().count()*(i+1);
}

  1. 其中一个函数,CreateHT 使用 random_device rd;.我不熟悉 C++,但 快速搜索 显示它生成非确定性随机数".
  1. One of the functions, CreateHT uses random_device rd;. I'm not familiar with C++ but a quick search reveals it generates "non-deterministic random numbers".

void CreateHt(...) {
  // Ht is p by L
  random_device rd; //seed for Random Number Generator(RNG)
  mt19937_64 mt(rd()); //Use Mersenne Twister as RNG

  ...
    shuffle(perm.begin(), perm.end(), mt);
  ...
}

据我所知,rand()random_device 都是 C++ 的内置随机工件.我怎样才能让他们尊重 .Random.seed?

From my understanding, both rand() and random_device are C++'s builtin random artifacts. How can I make them respect .Random.seed?

推荐答案

你不应该使用 rand(), c.f.https://channel9.msdn.com/Events/GoingNative/2013/rand-被认为是有害的.特别是 rand() 不是线程安全的,所以将它与 OpenMP 结合起来是行不通的.但是,使用 C++11 的 random 标头也不是一个好主意,因为 WRE 不鼓励使用它.没有给出原因,但实现定义的分布函数是可能的.

You should not use rand(), c.f. https://channel9.msdn.com/Events/GoingNative/2013/rand-Considered-Harmful. In particular rand() is not thread safe, so combining it with OpenMP will not work. However, going for C++11's random header is not a good idea either, since its usage is discouraged by WRE. No reason is given, but the distribution functions being implementation defined is a likely one.

可能的替代方案:

  • 使用 R 的 RNG.Rcpp 在 RRcpp 命名空间中提供了许多包装函数.此外,R_unif_index 有助于获得一个范围内的无偏整数.

  • Use R's RNG. Rcpp provides many wrapper functions in the R and Rcpp namespace. In addition R_unif_index is helpful for getting an unbiased integer within a range.

使用 BH 包提供的 boost.random 中的 RNG.调用 R 的 RNG 为它们设置种子,使所有内容都可重现.

Use the RNGs from boost.random provided by the BH package. Seed them with a call to R's RNG to make everything reproducible.

使用替代软件包,例如 rTRNG, sitmo 或我自己的 dqrng.这在 并行 RNG 的上下文中特别有用.可以通过 R 使用 NG 在这里播种

Use alternative packages like rTRNG, sitmo or my own dqrng. This is particularly helpful in the context of parallel RNGs. Seeding via R's RNG can be used here as well.

这篇关于Rcpp 中的 C++ 内置随机工件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆