重复播种随机数发生器产生一个合理散列函数? [英] Is repeatedly seeding a random number generator a reasonable hash function?

查看:163
本文介绍了重复播种随机数发生器产生一个合理散列函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要产生大量的随机数据,这是可重复一个给定的,包括数字列表:

I wish to generate a large amount of random data, which is reproducible for a given key, comprising a list of numbers:

[a, b, c, d, e, ...]

是对以下的好或明智的方式来获得一个RNG进入状态生成随机数据,以这样的方式,对于每个n元组 [A,B,C,..., N] ,数据是不相关的输出为相邻的正元组 [A + 1,b,C,...,N] [A,b + 1,C,...,N]

Is the following a good or sensible way to get a RNG into a state to generate random data, in such a way that for each n-tuple [a, b, c, ..., n], that data is uncorrelated with the output for the "adjacent" n-tuples [a+1, b, c, ..., n], [a, b+1, c, ..., n], etc.

srand(a);
srand(rand() * b);
srand(rand() * c);
...
srand(rand() * n);

# generate random data:
for (int i=0; i < 100; +i)
  printf("%d", rand());

我认为这个问题可以归结为以下几点:在 rand_hash 为2元组好的哈希函数(A,B)

I think this question boils down to the following: is rand_hash a good hash function for the 2-tuple (a, b)?

int rand_hash(int a, int b) { 
  srand(a); 
  srand(rand() * b); 
  return rand();
}

注:我不想暗示函数srand 兰特是任何特定实现的RNG的。承担我们使用一个很好的梅森倍捻机code参数的缘故。

NB: I don't wish to imply that srand and rand are any particular implementation of an RNG. Assume for the sake of argument that we're using a good Mersenne Twister code.

修改:如果它是不明确的,由合理的散列函数我的意思如下。在一个2元组的受限制的情况下, [A,B] ,然后输出 rand_hash 应该是超过均匀 INT 的范围,(一般)应该有变化的幅度没有关系 A b 和返回值的变化幅度。

Edit: If it isn't clear, by "reasonable hash function" I mean the following. In the restricted case of a 2-tuple [a, b], then the output of rand_hash should be uniform over the range of int, and (typically) there should be no correlation between the magnitude in the change of a or b and the magnitude of the change in the return value.

推荐答案

没有,这不是一个合理的做法。

No, this is not a reasonable approach.


  1. 您不知道什么兰特实施是。随机数发生器被设计成在一段数生成mnumbers提供大致均匀分布的数字。它们不是设计成在该组(32位)的种子提供均匀分布的数字。在假想的 mersenne_twister 的情况下,随机数发生器的状态比你提供给函数srand (特别是整数大得多, 624 * sizeof的(INT))。大多数RNG必须确保其输出是随机的,统一的力量是从另外的状态,你把那个了。 (种子只能是2 ^ 32个州的)

  2. 如果你曾经升级你的编译器和库或类似的东西,你可能已经连载到磁盘上任何事情都会变得不可读。 (如兰特是一个黑盒子,没有人说,明天的比赛实现今天的)。

  3. 您的哈希函数的输出返回相同的输入同样的事情函数srand 。因此,你已经有了一个哈希 - 输入到函数srand 。该RNG生成一个给定的输入相同的输出到函数srand 。因此可能得到哈希的数量不超过刚刚返回你将已经计算出的哈希值越大。如果您的初始散列成函数srand是分布不均的哈希表,然后扩展的哈希适当,使得它在你的表表现良好。

  4. 有关兰特的一些实现,这种执行极差。考虑一个线性同余发生器(这是比较常见的有C库,因为它有<$ C状态$ C>的sizeof(INT) - 例如<一个href=\"http://www.google.com/$c$csearch#p9nGS4eQGUI/gnu/gsl/gsl-1.8.tar.gz%7C8VCQSLJ5jR8/gsl-1.8/rng/rand.c&q=rand&type=cs\"相对=nofollow> BSD的发电机)。一个LCG如下形式 xNext = A * xCurrent + B 。试想一下:

  1. You don't know what the implementation of rand is. Random number generators are designed to provide approximately uniformly distributed numbers over a period of several generated mnumbers. They are not designed to provide uniformly distributed numbers over the set of (32 bit) seeds. In your hypothetical mersenne_twister case, the random number generator has state much larger than the integer you supply to srand (specifically, 624*sizeof(int)). Most of the power the RNG has to ensure its output is random and uniform are from that additional state, and you took that away. (The seed can be only one of 2^32 states)
  2. If you ever upgrade your compiler or libraries or something similar, anything you might have serialized to disk will become unreadable. (If rand is a black box, nobody says that tomorrow's implementation matches today's).
  3. Your hashing function's output returns the same thing for the same inputs to srand. Therefore, you already have a hash -- the input to srand. The RNG generates the same output for a given input to srand. Therefore the number of hashes you may obtain is no greater than just returning the hash you would have already calculated. If your initial hash into srand is of poor distribution for a hash table, then scale the hash appropriately such that it performs well in your table.
  4. For some implementations of rand, this performs extremely poorly. Consider a linear congruential generator (which is more common with C libraries because it has state of sizeof(int) -- e.g. the BSD generator ). A LCG follows the form xNext = a*xCurrent + b. Consider:

static int seed = 0;

void srand(int newSeed)
{
    seed = newSeed;
}

int rand()
{
    seed = (int) ((1103515245 * ((unsigned int)seed) + 12345) & 0x7fffffffUL); 
    return seed;
}

请注意,这(普通)类型的发电机产生容易关联到你的输入值的哈希值。

Note that this (common) type of generator produces hash values easily correlated to your input values.

这篇关于重复播种随机数发生器产生一个合理散列函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆