.Net 的“Random"类中的错误? [英] Bug in .Net's `Random` class?

查看:55
本文介绍了.Net 的“Random"类中的错误?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在看一个问题,该问题讨论的是 Fisher-Yates shuffle 算法的错误实现,但我对错误实现时存在偏差感到困惑.

这两种算法是:

private Random _random = new Random();公共 int[] FisherYates(int[] 源){int[] 输出 = source.ToArray();for (var i = 0; i < output.Length; i++){var j = _random.Next(i, output.Length);(输出[i], 输出[j]) = (输出[j], 输出[i]);}返回输出;}public int[] FisherYatesBad(int[] source){int[] 输出 = source.ToArray();for (var i = 0; i < output.Length; i++){var j = _random.Next(0, output.Length);(输出[i], 输出[j]) = (输出[j], 输出[i]);}返回输出;}

一个非常微妙的不同,但足以引起巨大的偏见.

良好的实施:

糟糕的实施:

为了清楚说明这些图,我从数字 0 到 99 开始,使用任何算法创建 10_000_000 次随机播放,然后对每个随机播放中的值求平均值以获得一组数字.如果 shuffle 尝试随机,那么所有 100 个数字都属于同一个正态分布.

现在,一切都很好,但我想我会检查一下这些方法是否产生了有效的结果:

public int[] OrderByRandomNext(int[] source) =>source.OrderBy(x => _random.Next()).ToArray();public int[] OrderByRandomNextDouble(int[] source) =>source.OrderBy(x => _random.NextDouble()).ToArray();

两者都很好,但它们是公平的洗牌吗?

OrderByRandomNext:

OrderByRandomNextDouble:

是否注意到 1100 的数字都明显较低?

嗯,我认为这可能是 OrderBy 工作方式的产物.因此,我使用另一个随机数生成器对其进行了测试——Eric Lippert 在他改进的随机系列中使用了一个.

public int[] OrderByBetterRandomNextDouble(int[] source) =>source.OrderBy(x => BetterRandom.NextDouble()).ToArray();公共静态类 BetterRandom{私有静态只读 ThreadLocalcng =new ThreadLocal(RandomNumberGenerator.Create);私有静态只读 ThreadLocal字节 =new ThreadLocal(() => new byte[sizeof(int)]);public static int NextInt(){crng.Value.GetBytes(bytes.Value);返回 BitConverter.ToInt32(bytes.Value, 0) &int.MaxValue;}公共静态双 NextDouble(){而(真){long x = NextInt() &0x001FFFFF;x<<=31;x |= (long)NextInt();双 n = x;const double d = 1L <<52;双 q = n/d;如果(q != 1.0)返回 q;}}}

好吧,这是图表:

没有偏见!

这是我生成数据的代码(在 LINQPad 中运行):

void Main(){无功n = 100;无功 s = 1000000;var numbers = Enumerable.Range(0, n).ToArray();var 算法 = new Func[]{费舍尔耶茨OrderByRandomNext,OrderByRandomNextDouble,OrderByBetterRandomNextDouble,};var 平均值 =算法.选择(算法=>可枚举.Range(0, numbers.Length).Select(x =>可枚举.Range(0, s).Select(y => algorithm(numbers)).Aggregate(0.0, (a, v) => a + (double)v[x]/s)).ToArray()).Select(x => new{平均值 = x,分布 = Accord.Statistics.Distributions.Univariate.NormalDistribution.Estimate(x.Skip(1).SkipLast(1).ToArray()),first = x.First(),last = x.Last(),}).Select(x => new{x.平均值,x.分布,x.首先,x.最后,first_prob =x.distribution.DistributionFunction(x.first),last_prob = x.distribution.DistributionFunction(x.last),}).ToArray();无功d =平均数.转储();}私人随机_随机=新随机();公共 int[] FisherYates(int[] 源){int[] 输出 = source.ToArray();for (var i = 0; i < output.Length; i++){var j = _random.Next(i, output.Length);(输出[i], 输出[j]) = (输出[j], 输出[i]);}返回输出;}public int[] OrderByRandomNext(int[] source) =>source.OrderBy(x => _random.Next()).ToArray();public int[] OrderByRandomNextDouble(int[] source) =>source.OrderBy(x => _random.NextDouble()).ToArray();public int[] OrderByBetterRandomNextDouble(int[] source) =>source.OrderBy(x => BetterRandom.NextDouble()).ToArray();公共静态类 BetterRandom{私有静态只读 ThreadLocalcng =new ThreadLocal(RandomNumberGenerator.Create);私有静态只读 ThreadLocal字节 =new ThreadLocal(() => new byte[sizeof(int)]);public static int NextInt(){crng.Value.GetBytes(bytes.Value);返回 BitConverter.ToInt32(bytes.Value, 0) &int.MaxValue;}公共静态双 NextDouble(){而(真){long x = NextInt() &0x001FFFFF;x<<=31;x |= (long)NextInt();双 n = x;const double d = 1L <<52;双 q = n/d;如果(q != 1.0)返回 q;}}}

这是我生成的数据:

<前>分布|第一 |最后 |first_prob |最后一个问题-------------------------------------------------------- |------------------ |------------------ |--------------- |---------------------N(x; μ = 49.50267467345823, σ² = 0.0008896228453062147) |49.505465999987585 |49.49833699998965 |0.5372807100387846 |0.44218570467529394N(x; μ = 49.50503062243786, σ² = 0.0009954477334487531) |49.36330799998817 |49.37124399998651 |3.529550818615057E-06 |1.115772521409486E-05N(x; μ = 49.505720877539765, σ² = 0.0008257970106087029) |49.37231699998847 |49.386660999990106 |1.7228855271333998E-06 |1.712972513601141E-05N(x; μ = 49.49994663264188, σ² = 0.0007518765247716318) |49.50191999998847 |49.474235999989205 |0.5286859991636343 |0.17421285127499514

这是我的问题.System.Random 及其引入的偏差是怎么回事?

解决方案

.NET 中(包括).NET 5 之前的默认 RNG 存在已知的偏差和性能问题,此处记录最多 https://github.com/dotnet/runtime/issues/23198:

  • Donald E. Knuth 的减法随机数生成器实现中的一个拼写错误,实际效果未知.
  • 具有未知实际效果的不同模数(2^32-1 而不是 2 的幂).
  • Next(0, int.MaxValue) 有很大的偏差.
  • NextDouble() 只产生 2^31 个可能的值,它可以从大约2^62 个不同的值.

这就是 .NET 6 实现更好算法的原因 (xoshiro256**).当您在没有种子的情况下实例化 new Random() 实例时,您将获得更好的 RNG.这在 https://github.com/dotnet/runtime/pull/47085.不幸的是,在提供种子时替换旧的 RNG 并不容易,因为人们可能会依赖当前有偏见的 RNG 的行为.

即使 xoshiro256** 有一些已记录的缺陷(以及反驳),我发现它非常适合我的目的.我已复制改进了 .NET 6 的实现并使用它.

旁注:LINQ 查询是惰性求值的(又名延迟执行").如果您在 .OrderBy lambda 中使用 RNG,如果您迭代多次,您可能会得到令人困惑的结果,因为每次都可能更改顺序.一些排序算法依赖于这样一个事实,即元素不会突然改变它们的相对顺序才能正常工作.返回不一致的排序值会破坏这种排序算法.当然,今天在 LINQ-to-Objects 中的 OrderBy 实现工作正常,但没有文件保证它必须与随机"变化的值一起工作.一个合理的选择是 .OrderBy(e => HashCode.Combine(0x1337, e)).

I was looking at a question that was talking about a bad implementation of the Fisher-Yates shuffling algorithm and I was perplexed that there was a bias when implemented incorrectly.

The two algorithms are these:

private Random _random = new Random();

public int[] FisherYates(int[] source)
{
    int[] output = source.ToArray();
    for (var i = 0; i < output.Length; i++)
    {
        var j = _random.Next(i, output.Length);
        (output[i], output[j]) = (output[j], output[i]);
    }
    return output;
}

public int[] FisherYatesBad(int[] source)
{
    int[] output = source.ToArray();
    for (var i = 0; i < output.Length; i++)
    {
        var j = _random.Next(0, output.Length);
        (output[i], output[j]) = (output[j], output[i]);
    }
    return output;
}

A really subtle different, but enough to cause a massive bias.

Good implementation:

Bad implementation:

To be clear about these plots, I start with the numbers 0 to 99, create 10_000_000 shuffles using whichever algorithm and then I average the values in each of the shuffles to get a single set of figures. If the shuffle is trying random then all 100 figures would belong to the same normal distribution.

Now, that's all fine, but I thought I'd check to see if these methods produce valid results:

public int[] OrderByRandomNext(int[] source) => source.OrderBy(x => _random.Next()).ToArray();

public int[] OrderByRandomNextDouble(int[] source) => source.OrderBy(x => _random.NextDouble()).ToArray();

Both nice a neat, but are they a fair shuffle?

OrderByRandomNext:

OrderByRandomNextDouble:

Notice that the 1 and the 100 figures are significantly lower in each?

Well, I thought it might be an artefact of how OrderBy works. So I tested it with another random number generator - one brought to use from Eric Lippert in his improving Random series.

public int[] OrderByBetterRandomNextDouble(int[] source) => source.OrderBy(x => BetterRandom.NextDouble()).ToArray();

public static class BetterRandom
{
    private static readonly ThreadLocal<RandomNumberGenerator> crng =
        new ThreadLocal<RandomNumberGenerator>(RandomNumberGenerator.Create);

    private static readonly ThreadLocal<byte[]> bytes =
        new ThreadLocal<byte[]>(() => new byte[sizeof(int)]);

    public static int NextInt()
    {
        crng.Value.GetBytes(bytes.Value);
        return BitConverter.ToInt32(bytes.Value, 0) & int.MaxValue;
    }

    public static double NextDouble()
    {
        while (true)
        {
            long x = NextInt() & 0x001FFFFF;
            x <<= 31;
            x |= (long)NextInt();
            double n = x;
            const double d = 1L << 52;
            double q = n / d;
            if (q != 1.0)
                return q;
        }
    }
}

Well, here's the chart:

There's no bias!

Here's my code to generate the data (run in LINQPad):

void Main()
{
    var n = 100;
    var s = 1000000;

    var numbers = Enumerable.Range(0, n).ToArray();

    var algorithms = new Func<int[], int[]>[]
    {
        FisherYates,
        OrderByRandomNext,
        OrderByRandomNextDouble,
        OrderByBetterRandomNextDouble,
    };

    var averages =
        algorithms
            .Select(algorithm =>
                Enumerable
                    .Range(0, numbers.Length)
                    .Select(x =>
                        Enumerable
                            .Range(0, s)
                            .Select(y => algorithm(numbers))
                            .Aggregate(0.0, (a, v) => a + (double)v[x] / s))
                    .ToArray())
            .Select(x => new
            {
                averages = x,
                distribution = Accord.Statistics.Distributions.Univariate.NormalDistribution.Estimate(x.Skip(1).SkipLast(1).ToArray()),
                first = x.First(),
                last = x.Last(),
            })
            .Select(x => new
            {
                x.averages,
                x.distribution,
                x.first,
                x.last,
                first_prob =x.distribution.DistributionFunction(x.first),
                last_prob = x.distribution.DistributionFunction(x.last),
            })
            .ToArray();

    var d = 

    averages.Dump();
}

private Random _random = new Random();

    public int[] FisherYates(int[] source)
    {
        int[] output = source.ToArray();
        for (var i = 0; i < output.Length; i++)
        {
            var j = _random.Next(i, output.Length);
            (output[i], output[j]) = (output[j], output[i]);
        }
        return output;
    }

public int[] OrderByRandomNext(int[] source) => source.OrderBy(x => _random.Next()).ToArray();

public int[] OrderByRandomNextDouble(int[] source) => source.OrderBy(x => _random.NextDouble()).ToArray();

    public int[] OrderByBetterRandomNextDouble(int[] source) => source.OrderBy(x => BetterRandom.NextDouble()).ToArray();

    public static class BetterRandom
    {
        private static readonly ThreadLocal<RandomNumberGenerator> crng =
            new ThreadLocal<RandomNumberGenerator>(RandomNumberGenerator.Create);

        private static readonly ThreadLocal<byte[]> bytes =
            new ThreadLocal<byte[]>(() => new byte[sizeof(int)]);

        public static int NextInt()
        {
            crng.Value.GetBytes(bytes.Value);
            return BitConverter.ToInt32(bytes.Value, 0) & int.MaxValue;
        }

        public static double NextDouble()
        {
            while (true)
            {
                long x = NextInt() & 0x001FFFFF;
                x <<= 31;
                x |= (long)NextInt();
                double n = x;
                const double d = 1L << 52;
                double q = n / d;
                if (q != 1.0)
                    return q;
            }
        }
    }

Here's the data that I generated:

distribution                                             | first              | last               | first_prob             | last_prob            
-------------------------------------------------------- | ------------------ | ------------------ | ---------------------- | ---------------------
N(x; μ = 49.50267467345823, σ² = 0.0008896228453062147)  | 49.505465999987585 | 49.49833699998965  | 0.5372807100387846     | 0.44218570467529394  
N(x; μ = 49.50503062243786, σ² = 0.0009954477334487531)  | 49.36330799998817  | 49.37124399998651  | 3.529550818615057E-06  | 1.115772521409486E-05
N(x; μ = 49.505720877539765, σ² = 0.0008257970106087029) | 49.37231699998847  | 49.386660999990106 | 1.7228855271333998E-06 | 1.712972513601141E-05
N(x; μ = 49.49994663264188, σ² = 0.0007518765247716318)  | 49.50191999998847  | 49.474235999989205 | 0.5286859991636343     | 0.17421285127499514  

Here's my question. What's up with System.Random and the bias it introduces?

解决方案

The default RNG in .NET up to (and including) .NET 5 has known bias and performance issues, most documented here https://github.com/dotnet/runtime/issues/23198:

  • A typo in the implementation of Donald E. Knuth's subtractive random number generator with unknown practical effects.
  • A different modulo (2^32-1 instead of a power of two) with unknown practical effects.
  • Next(0, int.MaxValue) has heavy bias.
  • NextDouble() only produces 2^31 possible values, where it could pick from approx. 2^62 distinct values.

This is why .NET 6 implemented a better algorithm (xoshiro256**). You will get this better RNG when you instantiate a new Random() instance without a seed. This is described in https://github.com/dotnet/runtime/pull/47085. Unfortunately, it's not easy to replace the old RNG when a seed is provided since people might rely on the behavior of the current, biased RNG.

Even though xoshiro256** has some documented flaws (and a rebuttal) as well, I found it to work pretty well for my purposes. I have copied the improved implementation from .NET 6 and use that.

Side-note: LINQ queries are lazily evaluated (a.k.a. "deferred execution"). If you use a RNG in the .OrderBy lambda, you might get confusing results if you iterate multiple times, because the order might change every time. Some sorting algorithms rely on the fact that elements won't suddenly change their relative order to work correctly. Returning inconsistent ordering values will break such sorting algorithms. Sure, todays OrderBy implementation in LINQ-to-Objects works fine, but there's no documented guarantee that it has to work with "randomly"-changing values. A reasonable alternative is .OrderBy(e => HashCode.Combine(0x1337, e)).

这篇关于.Net 的“Random"类中的错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆