有关随机数生成器的可能问题 [英] Possible issue about random number generator

查看:107
本文介绍了有关随机数生成器的可能问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要从整数序列开始生成一定数量的随机数,并且使用以下代码:result<-sample(x=c(2:50), size=10e6, replace=T).我发现增加结果向量的长度(最多10 ^ 6),如果向量x的长度为奇数,则随机数的分布不是随机的.当绘制result的直方图时,我通常会得到序列的第一个数字(在示例中为"2")具有一列(以及许多元素),该列始终高于其他列.如果x=c(1:50)x的长度是偶数,则随机生成器的行为似乎还可以.关于这个奇怪的结果,R中的随机数生成器是否存在任何问题?我在Ubuntu 13.10下使用R 3.0.1.

I need to generate a certain number of random numbers starting from a sequence of integers and I use the following code: result<-sample(x=c(2:50), size=10e6, replace=T). I find that increasing the length of the result vector (up to a length of 10^6), the distribution of random numbers is not random if the length of the vector x is an odd number. When plotting the histogram of result I usually get that the 1st number of the sequence (in the example the '2') has a column (and so a number of elements) that is always higher than the other columns. If x=c(1:50), and so the length of x is an even number, the behaviour of the random generator seems to be ok. Is there any issue about random number generators in R about this strange result? I use R 3.0.1 under Ubuntu 13.10.

推荐答案

正如我在上面的评论中提到的,这与随机数生成器绝对无关.

As I mentioned in my comment above, this has absolutely nothing to do with random number generators.

考虑:

set.seed(123)
result <- sample(x=c(2:50), size=10e4, replace=TRUE)
x <- hist(result)

看起来有些错误,是吗?但仔细看看:

Something looks wrong, eh? But look closer:

> x$breaks
 [1]  2  4  6  8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50
> x$counts
 [1] 6132 3971 4179 4115 4108 4002 4145 4073 4192 4117 4123 4099 4054 4013 4067 4055 4073 4082 4095
[20] 4088 4044 4050 4027 4096

相对...

> table(result)
result
   2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19   20   21 
1979 2100 2053 1978 1993 2152 2027 2058 2057 2074 2034 1991 2011 2075 2070 2067 2006 2047 2145 2019 
  22   23   24   25   26   27   28   29   30   31   32   33   34   35   36   37   38   39   40   41 
2098 2060 2063 2099 2000 2016 2038 1990 2023 1976 2091 2060 1995 2061 2012 2003 2079 2008 2087 2036 
  42   43   44   45   46   47   48   49   50 
2052 1989 2055 2044 2006 2001 2026 2062 2034 

请注意,hist中的第一个bin似乎包含所有2、3和4值.这是因为hist所采用的默认分箱策略在分箱边界上增加了一些模糊性",从而导致前两个断点略小于2.0,而略大于4.0.将其与右关闭的时间间隔结合起来,您将得到结果直方图.

Note that the first bin from hist appears to include all 2, 3 and 4 values. This is because the default binning strategy employed by hist adds some "fuzziness" to the bin boundaries, which result in the first two break point being slightly less than 2.0 and slightly more than 4.0. Combine that with the intervals being right closed, and you get the resulting histogram.

比较:

hist(result,breaks = 1:50)

这篇关于有关随机数生成器的可能问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆