Rcpp 中的 C++ 内置随机工件 [英] C++ builtin random artifacts in Rcpp
问题描述
我正在维护一个名为 iRF 的 R 包,一个大问题是它不是t 可重现.换句话说,我无法通过设置 set.seed
获得相同的结果.出于这个问题的目的,让我们关注函数 RIT
.你不需要弄清楚它是做什么的;只需查看 RNG 处理部分即可.
I'm maintaining an R package named iRF, and a big problem is that it isn't reproducible. In other words, I cannot get the same result by setting set.seed
. For the purpose of this question, let's focus on the function RIT
. You don't need to figure out what it does; just look at the RNG handling part instead.
它定义在R/RIT.R
,根据输入类型调用 RIT_1class
或 RIT_2class
.RIT_[1|2]class
函数都定义在 src/ExportedFunctionsRIT.cpp
,它依次调用 src/RITmain.h
和 src/RITaux.h
.
It is defined in R/RIT.R
, which calls either RIT_1class
or RIT_2class
depending on the input type. Both RIT_[1|2]class
functions are defined in src/ExportedFunctionsRIT.cpp
, which in turn calls helper functions defined in src/RITmain.h
and src/RITaux.h
.
我正在使用 Rcpp 属性,因此 RIT_[1|2]class
中的随机性应该由隐式 RNGScope
正确处理,如在这个答案中.然而,这个代码库在两个方面处理起来很棘手,
I'm using Rcpp attributes, so randomness in RIT_[1|2]class
should be correctly handled by an implicit RNGScope
, as mentioned in this answer. However, this codebase is tricky to tackle in two ways,
- 函数
RIT_basic
和RIT_minhash
a> 使用//[[Rcpp::plugins(openmp)]]
.幸运的是,原作者给了每个线程一个单独的种子,所以希望我可以用seeds[i] = rand() * (i+1)
使它具有确定性,但是你可以说这不是不够,因为我在这里问.
- The functions
RIT_basic
andRIT_minhash
use// [[Rcpp::plugins(openmp)]]
. Fortunately, the original author gives each thread a separate seed, so hopefully, I can make it deterministic withseeds[i] = rand() * (i+1)
, yet you can tell this along isn't enough since I'm asking here.
// Set up vector of seeds for RNG
vector<unsigned int> seeds(n_cores);
for (int i=0; i<n_cores; i++) {
seeds[i] = chrono::high_resolution_clock::now().time_since_epoch().count()*(i+1);
}
- One of the functions,
CreateHT
usesrandom_device rd;
. I'm not familiar with C++ but a quick search reveals it generates "non-deterministic random numbers".
void CreateHt(...) {
// Ht is p by L
random_device rd; //seed for Random Number Generator(RNG)
mt19937_64 mt(rd()); //Use Mersenne Twister as RNG
...
shuffle(perm.begin(), perm.end(), mt);
...
}
据我所知,rand()
和 random_device
都是 C++ 的内置随机工件.我怎样才能让他们尊重 .Random.seed
?
From my understanding, both rand()
and random_device
are C++'s builtin random artifacts. How can I make them respect .Random.seed
?
推荐答案
你不应该使用 rand()
, c.f.https://channel9.msdn.com/Events/GoingNative/2013/rand-被认为是有害的.特别是 rand()
不是线程安全的,所以将它与 OpenMP 结合起来是行不通的.但是,使用 C++11 的 random
标头也不是一个好主意,因为 WRE 不鼓励使用它.没有给出原因,但实现定义的分布函数是可能的.
You should not use rand()
, c.f. https://channel9.msdn.com/Events/GoingNative/2013/rand-Considered-Harmful. In particular rand()
is not thread safe, so combining it with OpenMP will not work. However, going for C++11's random
header is not a good idea either, since its usage is discouraged by WRE. No reason is given, but the distribution functions being implementation defined is a likely one.
可能的替代方案:
使用 R 的 RNG.Rcpp 在
R
和Rcpp
命名空间中提供了许多包装函数.此外,R_unif_index
有助于获得一个范围内的无偏整数.
Use R's RNG. Rcpp provides many wrapper functions in the
R
andRcpp
namespace. In additionR_unif_index
is helpful for getting an unbiased integer within a range.
使用 BH 包提供的 boost.random
中的 RNG.调用 R 的 RNG 为它们设置种子,使所有内容都可重现.
Use the RNGs from boost.random
provided by the BH package. Seed them with a call to R's RNG to make everything reproducible.
使用替代软件包,例如 rTRNG
, sitmo
或我自己的 dqrng
.这在 并行 RNG 的上下文中特别有用.可以通过 R 使用 NG 在这里播种
Use alternative packages like rTRNG
, sitmo
or my own dqrng
. This is particularly helpful in the context of parallel RNGs. Seeding via R's RNG can be used here as well.
这篇关于Rcpp 中的 C++ 内置随机工件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!