随机数生成器(RNG/PRNG)返回种子的更新值 [英] Random number generator (RNG/PRNG) that returns updated value of seed

查看:148
本文介绍了随机数生成器(RNG/PRNG)返回种子的更新值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一个RNG,该RNG还返回更新后的种子的值.这样做的可能显而易见的原因是,以后可以在不更改现有RNG值的情况下将新的随机变量添加到程序中.有关此问题的python/numpy版本,请参见例如:

I'm trying to write an RNG that also returns the value of the updated seed. The perhaps obvious reason for this is so that new random variables can be added to the program later, without changing the values of the existing RNGs. For a python/numpy version of this issue see for example: Difference between RandomState and seed in numpy

下面是一个(尝试性的)建议解决方案的用法示例:

Here's a example of usage with a (tentative) proposed solution:

program main

   ! note that I am assuming size=12 for the random 
   ! seed but this is implementation specific and
   ! could be different in your implementation
   ! (you can query this with random_seed(size) btw)

   integer :: iseed1(12) = 123456789
   integer :: iseed2(12) = 987654321

   do i = 1, 2
      x = ran(iseed1)
      y = ran(iseed2)
   end do

end program main

function ran( seed )

   integer, intent(inout) :: seed(12)
   real                   :: ran

   call random_seed( put=seed )
   call random_number( ran )
   call random_seed( get=seed )

end function ran

请注意,此解决方案(以及任何其他解决方案)的重要方面是,如果在上面添加了seed3x3,则x1x2.同样,可以从代码中删除x1x2,而不会影响其他代码的值.

Note that a vital aspect of this solution (and any other solutions) is that if we added seed3 and x3 to the above, there would be no change to the realized values of x1 and x2. Similarly, either x1 or x2 could be deleted from the code without affecting the values of the other one.

如果有帮助,这ran()(扩展)功能如何在Compaq/HP和Intel编译器上起作用,我基本上是在尝试在gfortran上实现相同的API.但是请注意,该函数的种子是标量,而它是使用Fortran 90子例程random_seed的一维数组(数组的大小/长度特定于实现).

If it helps, this how the ran() (extension) function works on Compaq/HP and Intel compilers and I'm basically trying to implement that same API on gfortran. Note, however, that the seed for that function is a scalar whereas it is a one-dimensional array using the Fortran 90 subroutine random_seed (with the size/length of the array being implementation specific).

我正在提供当前的解决方案作为基准错误,但希望其他人可以批评该回答或提供更好的解决方案.

I am providing my current solution as a benchmark error but hope others can either critique that answer or provide better ones.

我希望根据标准PRNG测试对基准结果进行任何分析,尤其是对种子结实的分析.在基准测试中,我仅使用标量广播到非常简单明了的数组,并希望避免显式提供整个整数数组.

I would appreciate any analysis of the benchmark results according to standard PRNG tests and in particular, analysis of how the seed is set. In the benchmark I am merely using a scalar broadcast to an array which is very simple and concise and would like to avoid explicitly providing an entire array of integers.

因此,我想或者稍微严格地确认这很好,还是要比写出整个数组(例如12或33个整数)更简洁的方法来设置种子(以可重复的方式!).例如.也许有一些简单明了的方法可以从单个整数中生成一个由12个伪随机整数组成的漂亮流(由于数组长度如此之短,这可能很容易吗?).

So I would like either a somewhat rigorous confirmation that this is fine or else a more concise method of setting the seed (in a repeatable way!) than writing out an entire array of, say, 12 or 33 integers. E.g. maybe there is some simple and concise way to produce a nice stream of 12 pseudo-random integers from a single integer (as the array length is so short, this is probably easy?).

编辑以添加:有关在fortran中设置种子的后续问题:

Edit to add: Followup question about setting seeds in fortran: Correctly setting random seeds for repeatability

推荐答案

您提出的解决方案在我看来应该可以正常工作-您正在记录生成器的整个状态(通过get),并在必要时在流之间进行交换(通过put).请注意,尽管如此,我尚未真正测试过您的代码.

Your proposed solution looks to me like it should work - you are recording the entire state of the generator (via get), and swapping between streams when necessary (via put). Note that I haven't actually tested your code, though.

之所以出现这个答案,是因为先前的答案(现在已删除)仅保存了状态数组的第一个元素,并使用它来设置整个状态数组.看起来像这样:

This answer came about because a previous answer (now deleted) saved only the first element of the state array, and was using it to set the entire state array. It looked something like:

integer :: seed_scalar, state_array(12)
...
! -- Generate a random number from a thread that is stored as seed_scalar
state_array(:) = seed_scalar
call random_seed( put=state_array )
call random_number( ran )
call random_seed( get=state_array )
seed_scalar = state_array(1)
! -- discard state_array, store seed_scalar

该答案旨在证明,对于某些编译器(gfortran 4.8和8.1,pgfortran 15.10)而言,这种仅通过标量维护单独线程的方法会导致不良的RNG行为.

This answer is intended to demonstrate that, for some compilers (gfortran 4.8 and 8.1, pgfortran 15.10), this method of maintaining separate threads via only a scalar results in bad RNG behavior.

请考虑以下代码,以从根本上测试随机数生成器.生成了许多随机数(在此示例中为100M),并且通过两种方式监视性能.首先,记录随机数增加或减少的时间.其次,生成条宽为0.01的直方图. (这显然是一个原始的测试用例,但事实证明足以证明这一点.)最后,还生成了所有概率的估计单西格玛标准差.这使我们能够确定何时变异是随机的或具有统计意义的.

Consider the following code to primitively test a random number generator. Many random numbers are generated - 100M for this example - and performance is monitored in two ways. First, counts for when the random number increases or decreases are recorded. Second, a histogram with bin width 0.01 is generated. (This is obviously a primitive test case, but it turns out to be sufficient to prove the point.). Finally, the estimated one-sigma standard deviation for all probabilities is also generated. This allows us to determine when variations are random or statistically significant.

program main
   implicit none
   integer, parameter :: wp = selected_real_kind(15,307)
   integer :: N_iterations, state_size, N_histogram
   integer :: count_incr, count_decr, i, loc, fid
   integer, allocatable, dimension(:) :: state1, histogram
   real(wp) :: ran, oldran, p, dp, re, rt

   ! -- Some parameters
   N_iterations = 100000000
   N_histogram = 100

   call random_seed(size = state_size)
   allocate(state1(state_size), histogram(N_histogram))
   write(*,'(a,i3)') '-- State size is: ', state_size

   ! -- Initialize RNG, taken as today's date (also tested with other initial seeds)
   state1(:) = 20180815
   call random_seed(put=state1)

   count_incr = 0
   count_decr = 0
   histogram = 0
   ran = 0.5_wp      ! -- To yield proprer oldran

   ! -- Loop to grab a batch of random numbers
   do i=1,N_iterations
      oldran = ran

      ! -- This preprocessor block modifies the RNG state in a manner
      ! -- consistent with storing only the first value of it
#ifdef ALTER_STATE
      call random_seed(get=state1)
      state1(:) = state1(1)
      call random_seed(put=state1)
#endif

      ! -- Get the random number
      call random_number(ran)

      ! -- Process Histogram
      loc = CEILING(ran*N_histogram)
      histogram(loc) = histogram(loc) + 1

      if (oldran<ran) then
         count_incr = count_incr + 1
      elseif (oldran>ran) then
         count_decr = count_decr + 1
      else
         ! -- This should be statistically impossible
         write(*,*) '** Error, same value?', ran, oldran
         stop 1
      endif
   enddo

   ! -- For this processing, we have:
   !     re    Number of times this event occurred
   !     rt    Total number
   ! -- Probability of event is just re/rt
   ! -- One-sigma standard deviation is sqrt( re*(rt-re)/rt**3 )

   ! -- Write histogram
   ! -- For each bin, compute probability and uncertainty in that probability
   open(newunit=fid, file='histogram.dat', action='write', status='replace')
   write(fid,'(a)') '# i, P(i), dP(i)'

   rt = real(N_iterations,wp)
   do i=1,N_histogram
      re = real(histogram(i),wp)
      p = re/rt
      dp = sqrt(re*(rt-1)/rt**3)

      write(fid,'(i4,2es26.18)') i, p, dp
   enddo
   close(fid)

   ! -- Probability of increase and decrease
   re = real(count_incr,wp)
   p = re/rt
   dp = sqrt(re*(rt-1)/rt**3)
   write(*,'(a,f20.15)') 'Probability of increasing: ', p
   write(*,'(a,f20.15)') '      one-sigma deviation: ', dp

   re = real(count_decr,wp)
   p = re/rt
   dp = sqrt(re*(rt-1)/rt**3)
   write(*,'(a,f20.15)') 'Probability of decreasing: ', p
   write(*,'(a,f20.15)') '      one-sigma deviation: ', dp

   write(*,'(a)') 'complete'

end program main

不做修改

没有预处理器指令ALTER_STATE,我们将按预期使用gfortran的内置PRNG,并且结果符合预期:

Without Modification

Without the preprocessor directive ALTER_STATE, we are using gfortran's built-in PRNG as intended, and the results are as expected:

enet-mach5% gfortran --version
GNU Fortran (SUSE Linux) 4.8.3 20140627 [gcc-4_8-branch revision 212064]
Copyright (C) 2013 Free Software Foundation, Inc.

GNU Fortran comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of GNU Fortran
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING

enet-mach5% gfortran -cpp -fcheck=all main.f90 && time ./a.out
-- State size is:  12
Probability of increasing:    0.499970710000000
      one-sigma deviation:    0.000070708606619
Probability of decreasing:    0.500029290000000
      one-sigma deviation:    0.000070712748851
complete

real    0m2.414s
user    0m2.408s
sys     0m0.004s

增加/减少的预期概率为0.5,并且两者都具有估计的不确定性(0.49997从0.5小于0.00007).带有误差线的直方图是

The expected probability for increasing/decreasing is 0.5, and both are with the estimated uncertainty (0.49997 is less than 0.00007 from 0.5). The histogram, plotted with error bars, is

对于每个面元,与预期概率(0.01)的差异很小,并且通常在估计的不确定性之内.因为我们生成了许多数字,所以所有变化都很小(0.1%的量级).基本上,此测试未发现任何可疑行为.

For each bin, the variation from the expected probability (0.01) is small, and typically within the estimated uncertainty. Because we have generated many numbers, all variations are small (order 0.1%). Basically, this test hasn't found any suspicious behavior.

如果启用ALTER_STATE中的块,则每次生成数字时,我们都会修改随机数生成器的内部状态.这旨在模拟现在已删除的解决方案,该解决方案仅存储状态的第一个值.结果是:

If we enable the block inside ALTER_STATE, we are modifying the internal state of the random number generator each time a number is generated. This is intended to mimic a now-deleted solution, which stored only the first value of the state. The results are:

enet-mach5% gfortran -cpp -DALTER_STATE -fcheck=all main.f90 && time ./a.out
-- State size is:  12
Probability of increasing:    0.501831930000000
      one-sigma deviation:    0.000070840096343
Probability of decreasing:    0.498168070000000
      one-sigma deviation:    0.000070581021884
complete

real    0m16.489s
user    0m16.492s
sys     0m0.000s

观察到的增长可能性远超出预期的变化(26 sigma!).这已经表明出了点问题.直方图是:

The observed probability of increasing is far outside the expected variation (26 sigma!). This already indicates something is wrong. The histogram is:

请注意,y的范围已发生很大变化.在这里,我们的变化比以前的情况大了大约两个数量级,远远超出了预期的变化.由于y范围大得多,因此在这里很难看到误差线.如果我的随机数生成器的性能是这样的,那么我将它用在任何事情上都不会感到舒服,甚至没有掷硬币的感觉.

Note that the range in y has changed substantially. Here, we have a variation of approximately two orders of magnitude larger than the previous case, far outside of the expected variation. The error bars are hard to see, here, because the y-range is so much larger. If my random number generator performed like this, I wouldn't feel comfortable using it for anything, not even a coin flip.

random_seedputget选项访问随机数生成器的处理器相关内部状态.它通常比单个数具有更多的熵,并且其形式取决于处理器.不能保证第一个数字完全代表整个州.

The put and get options of random_seed access the processor-dependent internal state of the random number generator. This typically has more entropy than a single number, and its form is processor-dependent. There's no guarantee that the first number is at all representative of the entire state.

如果您要初始化一次随机种子并生成多次,则使用单个标量就可以了.但是,如果您打算使用该状态来生成每个单一数字,那么显然有必要存储多个单一数字.

If you're initializing a random seed once, and generating many times, using a single scalar is fine. However, it's clearly necessary to store more than a single number if you intend to use that state to generate every single number.

坦率地说,我对此原始测试能够证明不良行为感到有些惊讶. RNG的有效性是一个复杂的主题,我绝不是专家.结果还取决于编译器:

I'm frankly a little bit surprised this primitive test was able to demonstrate the bad behavior. Validity of RNG is a complex subject, and I am by no means an expert. The results were also compiler-dependent:

  • 显示的结果和直方图是针对gfortran 4.8(状态大小为12)的.
  • Intel 16.0使用的状态大小为2,该测试未显示任何不良行为.
  • gfortran 8.1.0的状态大小为33,PGI 15.10的状态大小为34.它们的行为是相同的,并且是最坏的.将整个状态设置为单个值时,随后的随机数总是 相同.从单个种子初始化时,这些生成器需要大约30代的启动"才能使数字看起来合理.在状态中仅保存一个种子时,这种启动不会发生.
  • The results and histograms shown are for gfortran 4.8, which has a state size of 12.
  • Intel 16.0 uses a state size of 2, and this test didn't show any undesirable behavior.
  • gfortran 8.1.0 has a state size of 33, and PGI 15.10 has a state size of 34. Their behavior was the same, and the worst yet. When setting the entire state to a single value, subsequent random numbers are always identical. These generators require a 'priming' of around 30 generations before the numbers start looking reasonable, when initialized from a a single seed. When saving only a single seed in the state, this priming never occurs.

仅保存单个值时,较大的状态大小可能导致更多的熵损失,因此它与较差的行为相关.这当然符合我的观察.但是,如果不了解每个生成器的内部,就无法分辨.

It's possible that a larger state size results in more entropy loss when saving only a single value, so it's correlated with poorer behavior. That certainly fits with my observations. Without knowing the internals of each generator, though, it's impossible to tell.

这篇关于随机数生成器(RNG/PRNG)返回种子的更新值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆