在random_device和seed_seq之间决定为多个随机数序列生成种子 [英] Deciding between random_device and seed_seq to generate seeds for multiple random number sequences

查看:134
本文介绍了在random_device和seed_seq之间决定为多个随机数序列生成种子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

编写要求多个独立随机数分布/序列的代码(下面的示例有两个)时,似乎有两种实现(伪)随机数生成的典型方法.一种是简单地使用random_device对象为两个独立的引擎生成两个随机种子:

std::random_device rd;
std::mt19937 en(rd());
std::mt19937 en2(rd());
std::uniform_real_distribution<> ureald{min,max};
std::uniform_int_distribution<> uintd{min,max};

另一个涉及使用random_device对象使用多个随机性来源"创建seed_seq对象:

// NOTE: keeping this here for history, but a (hopefully) corrected version of
// this implementation is posted below the edit
std::random_device rd;
std::seed_seq seedseq{rd(), rd(), rd()}; // is there an optimal number of rd() to use?
std::vector<uint32_t> seeds(5);
seedseq.generate(seeds.begin(), seeds.end());
std::mt19937 en3(seeds[0]);
std::mt19937 en4(seeds[1]);
std::uniform_real_distribution<> ureald{min,max};
std::uniform_int_distribution<> uintd{min,max};

在这两种方法中,有没有首选的方法?为什么?如果是后者,是否在生成seed_seq对象时使用了最佳数量的random_device源"?

是否有比我上面概述的这两种实现更好的方法来生成随机数?

谢谢!


编辑

(希望)对多个发行版的seed_seq实现的修正版本:

std::random_device rd;
std::seed_seq seedseq1{rd(), rd(), rd()}; // is there an optimal number of rd() to use?
std::seed_seq seedseq2{rd(), rd(), rd()};
std::mt19937 en3(seedseq1);
std::mt19937 en4(seedseq2);
std::uniform_real_distribution<> ureald{min,max};
std::uniform_int_distribution<> uintd{min,max};

解决方案

std::seed_seq通常用于如果您不信任默认实现来正确初始化所用引擎状态的情况.

在许多≥C++ 11实现中,std::default_random_enginestd::mt19937的别名,这是Mersenne Twister伪随机数生成算法的特定变体. 查看std::mt19937 的规范,我们发现它具有一个大小为624个无符号整数的状态,该状态足以容纳打算包含的19937位状态(这就是它获得名称的方式).传统上,如果仅使用单个uint32_t值作为种子(如果rdstd::random_device对象,则只需调用一次rd()即可得到),那么您将剩下大部分状态未初始化.

现在,对于任何担心它们的种子低劣的梅森·Twister引擎的人来说,一个好消息是,如果您使用单个uint32_t值(如std::default_random_engine engine{rd()};)构造一个std::mt19937,则需要初始化实现状态的其余部分通过置换原始种子值来进行,因此,一次调用rd()会产生有限范围的实际不同引擎状态,但至少适当地初始化引擎还是足够的.这将产生良好的品质".随机数生成器.

但是,如果出于加密原因(尽管注意std::mt19937本身不是加密安全的原因)或出于统计原因而担心引擎未正确植入种子,则可以使用std::seed_seq手动使用rd()填写每个值来指定整个状态,以便您可以相对肯定地确保引擎已正确播种.

对于临时使用或在不需要严格获得高质量随机数的情况下,只需调用一次std::random_device::operator()即可初始化.

如果要使用std::seed_seq,请确保正确设置(至少对于std::mt19937而言,原始代码中的示例绝对不正确,与使用rd()!). 这篇有关CodeReview的帖子包含经过正确审查的代码.

对于Mersenne Twister的预定义模板,状态大小始终为19968位,这略大于其实际所需的大小,但也是可以使用uint32_t值完全代表范围的最小值.这可以算出每个32位的624个 Words .因此,如果您打算使用种子序列,则可以通过对rd():

的624次调用来正确初始化它

//Code copied from https://codereview.stackexchange.com/questions/109260/seed-stdmt19937-from-stdrandom-device
std::vector<uint32_t> random_data(624);
std::random_device source;
std::generate(random_data.begin(), random_data.end(), std::ref(source));
std::seed_seq seeds(random_data.begin(), random_data.end());
std::mt19937 engine(seeds);
//Or:
//std::mt19937_64 engine(seeds);

如果使用的是非标准实例化std::mersenne_twister_engine,则可以通过将state_size乘以word_size然后除以32来查询特定情况所需的状态大小.

using mt_engine = std::mersenne_twister_engine</*...*/>;
constexpr size_t state_size = mt_engine::state_size * mt_engine::word_size / 32;
std::vector<uint32_t> random_data(state_size);
std::random_device source;
std::generate(random_data.begin(), random_data.end(), std::ref(source));
std::seed_seq seeds(random_data.begin(), random_data.end());
mt_engine engine (seeds);

对于其他引擎类型,您需要根据具体情况进行评估. std::linear_congruential_engine及其预定义的变体使用其字长的单个整数,因此它们仅需调用rd()即可进行初始化,因此不需要种子序列.我不确定std::subtract_with_carry_engine或其使用关联的std::discard_block_engine的工作方式,但是它似乎就像它们也只包含一个状态的 Word 一样./p>

When writing code that requires multiple independent random number distributions/sequences (example below with two), it seems that there are two typical ways to implement (pseudo-)random number generation. One is simply using a random_device object to generate two random seeds for the two independent engines:

std::random_device rd;
std::mt19937 en(rd());
std::mt19937 en2(rd());
std::uniform_real_distribution<> ureald{min,max};
std::uniform_int_distribution<> uintd{min,max};

The other involves using the random_device object to create a seed_seq object using multiple "sources" of randomness:

// NOTE: keeping this here for history, but a (hopefully) corrected version of
// this implementation is posted below the edit
std::random_device rd;
std::seed_seq seedseq{rd(), rd(), rd()}; // is there an optimal number of rd() to use?
std::vector<uint32_t> seeds(5);
seedseq.generate(seeds.begin(), seeds.end());
std::mt19937 en3(seeds[0]);
std::mt19937 en4(seeds[1]);
std::uniform_real_distribution<> ureald{min,max};
std::uniform_int_distribution<> uintd{min,max};

Out of these two, is there a preferred method? Why? If it is the latter, is there an optimal number of random_device "sources" to use in generating the seed_seq object?

Are there better approaches to random number generation than either of these two implementations I've outlined above?

Thank you!


Edit

(Hopefully) corrected version of seed_seq implementation for multiple distributions:

std::random_device rd;
std::seed_seq seedseq1{rd(), rd(), rd()}; // is there an optimal number of rd() to use?
std::seed_seq seedseq2{rd(), rd(), rd()};
std::mt19937 en3(seedseq1);
std::mt19937 en4(seedseq2);
std::uniform_real_distribution<> ureald{min,max};
std::uniform_int_distribution<> uintd{min,max};

解决方案

std::seed_seq is generally intended to be used if you don't trust the default implementation to properly initialize the state of the engine you're using.

In many ≥C++11 implementations, std::default_random_engine is an alias for std::mt19937, which is a specific variant of the Mersenne Twister Pseudorandom Number Generation algorithm. Looking at the specification for std::mt19937, we see that it has a state of size 624 unsigned integers, which is enough to hold the 19937 bits of state it is intended to encompass (which is how it gets its name). Traditionally, if you seed it with only a single uint32_t value (which is what you would get from calling rd() once, if rd is a std::random_device object), then you're leaving the vast majority of its state uninitialized.

Now, the good news for anyone about to panic about their poorly-seeded Mersenne Twister engines is that if you construct a std::mt19937 with a single uint32_t value (like std::default_random_engine engine{rd()};), the implementation is required to initialize the rest of the state by permutating the original seed value, so while a single invocation of rd() yields a limited range of actual differing engine states, it's still sufficient to at least properly initialize the engine. This will yield a "Good Quality" random number generator.

But if you're worried about the engine not being properly seeded, either for cryptographic reasons (though note that std::mt19937 itself is NOT cryptographically secure!) or simply for statistical reasons, you can use a std::seed_seq to manually specify the entire state, using rd() to fill in each value, so that you can guarantee to a relative degree of confidence that the engine is properly seeded.

For casual use, or scenarios where it's not strictly necessary to achieve high quality random numbers, simply initializing with a single call to std::random_device::operator() is fine.

If you want to use a std::seed_seq, make sure you set it up correctly (the example in your original code is definitely not correct, at least for std::mt19937, and would actually yield much worse results than simply using rd()!). This post on CodeReview contains code which has been vetted properly.

Edit:

For the predefined templates of Mersenne Twister, the state size is always 19968 bits, which is slightly more than what it actually needs, but also the smallest value that can fully represent the range using uint32_t values. This works out to 624 Words of 32-bits each. So if you plan to use a Seed Sequence, you would correctly initialize it with 624 invocations to rd():

//Code copied from https://codereview.stackexchange.com/questions/109260/seed-stdmt19937-from-stdrandom-device
std::vector<uint32_t> random_data(624);
std::random_device source;
std::generate(random_data.begin(), random_data.end(), std::ref(source));
std::seed_seq seeds(random_data.begin(), random_data.end());
std::mt19937 engine(seeds);
//Or:
//std::mt19937_64 engine(seeds);

If you're working with a non-standard instantiation of std::mersenne_twister_engine, the state size needed for that specific situation can be queried by multiplying its state_size by its word_size and then dividing by 32.

using mt_engine = std::mersenne_twister_engine</*...*/>;
constexpr size_t state_size = mt_engine::state_size * mt_engine::word_size / 32;
std::vector<uint32_t> random_data(state_size);
std::random_device source;
std::generate(random_data.begin(), random_data.end(), std::ref(source));
std::seed_seq seeds(random_data.begin(), random_data.end());
mt_engine engine (seeds);

For other engine types, you'll need to evaluate them on a case-by-case basis. std::linear_congruential_engine and its predefined variants use a single integer of its word size, so they only require a single invocation of rd() to initialize, and thus Seed Sequences are unnecessary. I'm not sure how std::subtract_with_carry_engine or its associated-by-use std::discard_block_engine work, but it seems like they also only contain a single Word of state.

这篇关于在random_device和seed_seq之间决定为多个随机数序列生成种子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆