为什么这个简单的 shuffle 算法会产生有偏差的结果?什么是简单的原因? [英] why does this simple shuffle algorithm produce biased results? what is a simple reason?

查看:34
本文介绍了为什么这个简单的 shuffle 算法会产生有偏差的结果?什么是简单的原因?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

看来这个简单的shuffle算法会产生有偏差的结果:

it seems that this simple shuffle algorithm will produce biased results:

# suppose $arr is filled with 1 to 52

for ($i < 0; $i < 52; $i++) { 
  $j = rand(0, 51);

  # swap the items

  $tmp = $arr[j];
  $arr[j] = $arr[i];
  $arr[i] = $tmp;
}

你可以试试...而不是用52,用3(假设只用了3张牌),运行10000次,统计结果,你会发现结果偏向于某些模式..

you can try it... instead of using 52, use 3 (suppose only 3 cards are used), and run it 10,000 times and tally up the results, you will see that the results are skewed towards certain patterns...

问题是......它会发生的简单解释是什么?

the question is... what is a simple explanation that it will happen?

正确的解决方案是使用类似的东西

the correct solution is to use something like

for ($i < 0; $i < 51; $i++) {  # last card need not swap 
  $j = rand($i, 51);        # don't touch the cards that already "settled"

  # swap the items

  $tmp = $arr[j];
  $arr[j] = $arr[i];
  $arr[i] = $tmp;
}

但问题是……为什么第一种方法,看似也是完全随机的,会使结果有偏差?

but the question is... why the first method, seemingly also totally random, will make the results biased?

更新 1:感谢这里的人指出它需要 rand($i, 51) 才能正确洗牌.

Update 1: thanks for folks here pointing out that it needs to be rand($i, 51) for it to shuffle correctly.

推荐答案

这是这些替换的完整概率树.

Here's the complete probability tree for these replacements.

让我们假设您从序列 123 开始,然后我们将枚举所有各种方法来使用相关代码生成随机结果.

Let's assume that you start with the sequence 123, and then we'll enumerate all the various ways to produce random results with the code in question.

123
 +- 123          - swap 1 and 1 (these are positions,
 |   +- 213      - swap 2 and 1  not numbers)
 |   |   +- 312  - swap 3 and 1
 |   |   +- 231  - swap 3 and 2
 |   |   +- 213  - swap 3 and 3
 |   +- 123      - swap 2 and 2
 |   |   +- 321  - swap 3 and 1
 |   |   +- 132  - swap 3 and 2
 |   |   +- 123  - swap 3 and 3
 |   +- 132      - swap 2 and 3
 |       +- 231  - swap 3 and 1
 |       +- 123  - swap 3 and 2
 |       +- 132  - swap 3 and 3
 +- 213          - swap 1 and 2
 |   +- 123      - swap 2 and 1
 |   |   +- 321  - swap 3 and 1
 |   |   +- 132  - swap 3 and 2
 |   |   +- 123  - swap 3 and 3
 |   +- 213      - swap 2 and 2
 |   |   +- 312  - swap 3 and 1
 |   |   +- 231  - swap 3 and 2
 |   |   +- 213  - swap 3 and 3
 |   +- 231      - swap 2 and 3
 |       +- 132  - swap 3 and 1
 |       +- 213  - swap 3 and 2
 |       +- 231  - swap 3 and 3
 +- 321          - swap 1 and 3
     +- 231      - swap 2 and 1
     |   +- 132  - swap 3 and 1
     |   +- 213  - swap 3 and 2
     |   +- 231  - swap 3 and 3
     +- 321      - swap 2 and 2
     |   +- 123  - swap 3 and 1
     |   +- 312  - swap 3 and 2
     |   +- 321  - swap 3 and 3
     +- 312      - swap 2 and 3
         +- 213  - swap 3 and 1
         +- 321  - swap 3 and 2
         +- 312  - swap 3 and 3

现在,第四列数字,即交换信息之前的一列,包含最终结果,有 27 种可能的结果.

Now, the fourth column of numbers, the one before the swap information, contains the final outcome, with 27 possible outcomes.

让我们计算每个模式出现的次数:

Let's count how many times each pattern occurs:

123 - 4 times
132 - 5 times
213 - 5 times
231 - 5 times
312 - 4 times
321 - 4 times
=============
     27 times total

如果你运行无限次随机交换的代码,模式 132、213 和 231 将比模式 123、312 和 321 更频繁地出现,仅仅是因为代码交换的方式使得更多可能会发生.

If you run the code that swaps at random for an infinite number of times, the patterns 132, 213 and 231 will occur more often than the patterns 123, 312, and 321, simply because the way the code swaps makes that more likely to occur.

现在,当然,你可以说,如果你运行代码 30 次 (27 + 3),你最终可能会得到所有模式出现 5 次,但是在处理统计数据时,你必须着眼于长期趋势.

Now, of course, you can say that if you run the code 30 times (27 + 3), you could end up with all the patterns occuring 5 times, but when dealing with statistics you have to look at the long term trend.

这里的 C# 代码探索了每种可能模式之一的随机性:

Here's C# code that explores the randomness for one of each possible pattern:

class Program
{
    static void Main(string[] args)
    {
        Dictionary<String, Int32> occurances = new Dictionary<String, Int32>
        {
            { "123", 0 },
            { "132", 0 },
            { "213", 0 },
            { "231", 0 },
            { "312", 0 },
            { "321", 0 }
        };

        Char[] digits = new[] { '1', '2', '3' };
        Func<Char[], Int32, Int32, Char[]> swap = delegate(Char[] input, Int32 pos1, Int32 pos2)
        {
            Char[] result = new Char[] { input[0], input[1], input[2] };
            Char temp = result[pos1];
            result[pos1] = result[pos2];
            result[pos2] = temp;
            return result;
        };

        for (Int32 index1 = 0; index1 < 3; index1++)
        {
            Char[] level1 = swap(digits, 0, index1);
            for (Int32 index2 = 0; index2 < 3; index2++)
            {
                Char[] level2 = swap(level1, 1, index2);
                for (Int32 index3 = 0; index3 < 3; index3++)
                {
                    Char[] level3 = swap(level2, 2, index3);
                    String output = new String(level3);
                    occurances[output]++;
                }
            }
        }

        foreach (var kvp in occurances)
        {
            Console.Out.WriteLine(kvp.Key + ": " + kvp.Value);
        }
    }
}

输出:

123: 4
132: 5
213: 5
231: 5
312: 4
321: 4

因此,虽然这个答案实际上很重要,但它并不是一个纯粹的数学答案,您只需评估随机函数的所有可能方式,并查看最终输出.

So while this answer does in fact count, it's not a purely mathematical answer, you just have to evaluate all possible ways the random function can go, and look at the final outputs.

这篇关于为什么这个简单的 shuffle 算法会产生有偏差的结果?什么是简单的原因?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆