分布和内部状态 [英] Distributions and internal state
问题描述
在Stackoverflow上,存在许多有关从a未知范围生成均匀分布的整数的问题.例如
On Stackoverflow there are many questions about generating uniformly distributed integers from a-priory unknown ranges. E.g.
- C++11 Generating random numbers from frequently changing range
- Vary range of uniform_int_distribution
典型的解决方案是这样的:
The typical solution is something like:
inline std::mt19937 &engine()
{
thread_local std::mt19937 eng;
return eng;
}
int get_int_from_range(int from, int to)
{
std::uniform_int_distribution<int> dist(from, to);
return dist(engine());
}
鉴于分发应该是一个轻量级的对象,并且没有多次重新创建它的性能问题,似乎即使是简单的分发也可能很好,并且通常将具有
Given that a distribution should be a lightweight object and there aren't performance concerns recreating it multiple times, it seems that even simple distribution may very well and usually will have some internal state.
所以我想知道是否通过不断重置来干扰分发的工作方式(即在每次调用get_int_from_range
时重新创建分发),我是否获得了正确分发的结果.
So I was wondering if interfering with how the distribution works by constantly resetting it (i.e. recreating the distribution at every call of get_int_from_range
) I get properly distributed results.
皮特·贝克尔(Pete Becker)和史蒂夫·杰索普(Steve Jessop)之间进行了漫长的讨论,但没有一句话. 在另一个问题中(我应该保留随机分布对象实例还是我总是可以重新创建它?)内部状态的问题"似乎不是很重要.
There's a long discussion between Pete Becker and Steve Jessop but without a final word. In another question (Should I keep the random distribution object instance or can I always recreate it?) the "problem" of the internal state doesn't seem very important.
C ++标准是否可以保证与此主题相关?
Does the C++ standard make any guarantee regarding this topic?
是以下实现方式(来自 N4316 -std :: rand替换)更可靠?
Is the following implementation (from N4316 - std::rand replacement) somewhat more reliable?
int get_int_from_range(int from, int to)
{
using distribution_type = std::uniform_int_distribution<int>;
using param_type = typename distribution_type::param_type;
thread_local std::uniform_int_distribution<int> dist;
return dist(engine(), param_type(from, to));
}
编辑
这重用了分发的可能的内部状态,但是它很复杂,我不确定这样做是否值得:
This reuses a possible internal state of a distribution but it's complex and I'm not sure it does worth the trouble:
int get_int_from_range(int from, int to)
{
using range_t = std::pair<int, int>;
using map_t = std::map<range_t, std::uniform_int_distribution<int>>;
thread_local map_t range_map;
auto i = range_map.find(range_t(from, to));
if (i == std::end(range_map))
i = range_map.emplace(
std::make_pair(from, to),
std::uniform_int_distribution<int>{from, to}).first;
return i->second(engine());
}
(来自 https://stackoverflow.com/a/30097323/3235496 )
推荐答案
有趣的问题.
所以我想知道是否通过以下方式干扰分发的工作方式 不断重置它(即在每个 调用get_int_from_range),我得到了正确分发的结果.
So I was wondering if interfering with how the distribution works by constantly resetting it (i.e. recreating the distribution at every call of get_int_from_range) I get properly distributed results.
我已经编写了代码来使用uniform_int_distribution
和poisson_distribution
对此进行测试.如果您愿意,可以很容易地扩展它以测试另一个发行版.答案似乎是是.
I've written code to test this with uniform_int_distribution
and poisson_distribution
. It's easy enough to extend this to test another distribution if you wish. The answer seems to be yes.
锅炉代码:
#include <random>
#include <memory>
#include <chrono>
#include <utility>
typedef std::mt19937_64 engine_type;
inline size_t get_seed()
{ return std::chrono::system_clock::now().time_since_epoch().count(); }
engine_type& engine_singleton()
{
static std::unique_ptr<engine_type> ptr;
if ( !ptr )
ptr.reset( new engine_type(get_seed()) );
return *ptr;
}
// ------------------------------------------------------------------------
#include <cmath>
#include <cstdio>
#include <vector>
#include <string>
#include <algorithm>
void plot_distribution( const std::vector<double>& D, size_t mass = 200 )
{
const size_t n = D.size();
for ( size_t i = 0; i < n; ++i )
{
printf("%02ld: %s\n", i,
std::string(static_cast<size_t>(D[i]*mass),'*').c_str() );
}
}
double maximum_difference( const std::vector<double>& x, const std::vector<double>& y )
{
const size_t n = x.size();
double m = 0.0;
for ( size_t i = 0; i < n; ++i )
m = std::max( m, std::abs(x[i]-y[i]) );
return m;
}
实际测试代码:
#include <iostream>
#include <vector>
#include <cstdio>
#include <random>
#include <string>
#include <cmath>
void compare_uniform_distributions( int lo, int hi )
{
const size_t sample_size = 1e5;
// Initialize histograms
std::vector<double> H1( hi-lo+1, 0.0 ), H2( hi-lo+1, 0.0 );
// Initialize distribution
auto U = std::uniform_int_distribution<int>(lo,hi);
// Count!
for ( size_t i = 0; i < sample_size; ++i )
{
engine_type E(get_seed());
H1[ U(engine_singleton())-lo ] += 1.0;
H2[ U(E)-lo ] += 1.0;
}
// Normalize histograms to obtain "densities"
for ( size_t i = 0; i < H1.size(); ++i )
{
H1[i] /= sample_size;
H2[i] /= sample_size;
}
printf("Engine singleton:\n"); plot_distribution(H1);
printf("Engine creation :\n"); plot_distribution(H2);
printf("Maximum difference: %.3f\n", maximum_difference(H1,H2) );
std::cout<< std::string(50,'-') << std::endl << std::endl;
}
void compare_poisson_distributions( double mean )
{
const size_t sample_size = 1e5;
const size_t nbins = static_cast<size_t>(std::ceil(2*mean));
// Initialize histograms
std::vector<double> H1( nbins, 0.0 ), H2( nbins, 0.0 );
// Initialize distribution
auto U = std::poisson_distribution<int>(mean);
// Count!
for ( size_t i = 0; i < sample_size; ++i )
{
engine_type E(get_seed());
int u1 = U(engine_singleton());
int u2 = U(E);
if (u1 < nbins) H1[u1] += 1.0;
if (u2 < nbins) H2[u2] += 1.0;
}
// Normalize histograms to obtain "densities"
for ( size_t i = 0; i < H1.size(); ++i )
{
H1[i] /= sample_size;
H2[i] /= sample_size;
}
printf("Engine singleton:\n"); plot_distribution(H1);
printf("Engine creation :\n"); plot_distribution(H2);
printf("Maximum difference: %.3f\n", maximum_difference(H1,H2) );
std::cout<< std::string(50,'-') << std::endl << std::endl;
}
// ------------------------------------------------------------------------
int main()
{
compare_uniform_distributions( 0, 25 );
compare_poisson_distributions( 12 );
}
在此处运行它.
C ++标准是否可以保证与此主题相关?
Does the C++ standard make any guarantee regarding this topic?
我不知道.但是,我要说的是,该标准提出了一个隐含的建议,即不要每次都重新创建引擎.对于任何发行版Distrib
,Distrib::operator()
的原型都采用引用URNG&
,而不是const引用.这是可以理解的,因为引擎可能需要更新其内部状态,但这也意味着代码看起来像这样
Not that I know of. However, I would say that the standard makes an implicit recommendation not to re-create the engine every time; for any distribution Distrib
, the prototype of Distrib::operator()
takes a reference URNG&
and not a const reference. This is understandably required because the engine might need to update its internal state, but it also implies that code looking like this
auto U = std::uniform_int_distribution(0,10);
for ( <something here> ) U(engine_type());
不会编译,对我而言,这显然是不编写此类代码的诱因.
does not compile, which to me is a clear incentive not to write code like this.
我敢肯定,关于如何正确使用随机库,有很多建议.如果您必须处理使用random_device
并允许确定性播种以进行测试的可能性,这的确会变得复杂,但是我认为将自己的建议也放到那里可能会很有用:
I'm sure there are plenty of advice out there on how to properly use the random library. It does get complicated if you have to handle the possibility of using random_device
s and allowing deterministic seeding for testing purposes, but I thought it might be useful to throw my own recommendation out there too:
#include <random>
#include <chrono>
#include <utility>
#include <functional>
inline size_t get_seed()
{ return std::chrono::system_clock::now().time_since_epoch().count(); }
template <class Distrib>
using generator_type = std::function< typename Distrib::result_type () >;
template <class Distrib, class Engine = std::mt19937_64, class... Args>
inline generator_type<Distrib> get_generator( Args&&... args )
{
return std::bind( Distrib( std::forward<Args>(args)... ), Engine(get_seed()) );
}
// ------------------------------------------------------------------------
#include <iostream>
int main()
{
auto U = get_generator<std::uniform_int_distribution<int>>(0,10);
std::cout<< U() << std::endl;
}
在此处运行它.希望这会有所帮助!
Run it here. Hope this helps!
编辑,我的第一个建议是一个错误,对此我深表歉意;我们不能像上面的测试中那样使用单例引擎,因为这将意味着两个均匀的int分布将产生相同的随机序列.相反,我依靠的事实是std::bind
用自己的种子在std::function
中本地复制了新创建的引擎,这产生了预期的行为;具有相同分布的不同生成器会产生不同的随机序列.
EDIT My first recommendation was a mistake, and I apologise for that; we can't use a singleton engine like in the tests above, because this would mean that two uniform int distributions would produce the same random sequence. Instead I rely on the fact that std::bind
copies the newly-created engine locally in std::function
with its own seed, and this yields the expected behaviour; different generators with the same distribution produce different random sequences.
这篇关于分布和内部状态的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!