为什么这个简单的洗牌算法产生偏见的结果吗?什么是一个简单的道理? [英] why does this simple shuffle algorithm produce biased results? what is a simple reason?

查看:127
本文介绍了为什么这个简单的洗牌算法产生偏见的结果吗?什么是一个简单的道理?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

看来这个简单的洗牌算法会产生偏差结果:

it seems that this simple shuffle algorithm will produce biased results:

# suppose $arr is filled with 1 to 52

for ($i < 0; $i < 52; $i++) { 
  $j = rand(0, 51);

  # swap the items

  $tmp = $arr[j];
  $arr[j] = $arr[i];
  $arr[i] = $tmp;
}

您可以尝试...而不是使用52,使用3(假设只有3卡使用),并运行10,000次,并从中总结出的结果,你会看到结果偏向于某些模式..

you can try it... instead of using 52, use 3 (suppose only 3 cards are used), and run it 10,000 times and tally up the results, you will see that the results are skewed towards certain patterns...

问题是......什么是一个简单的解释,它会发生吗?

the question is... what is a simple explanation that it will happen?

正确的解决办法是使用类似

the correct solution is to use something like

for ($i < 0; $i < 51; $i++) {  # last card need not swap 
  $j = rand($i, 51);        # don't touch the cards that already "settled"

  # swap the items

  $tmp = $arr[j];
  $arr[j] = $arr[i];
  $arr[i] = $tmp;
}

但问题是......为什么第一种方法,貌似也完全随机的,将会使结果偏颇?

but the question is... why the first method, seemingly also totally random, will make the results biased?

更新1:感谢乡亲在这里指出,它需要兰特($ I,51),使其正常洗牌

Update 1: thanks for folks here pointing out that it needs to be rand($i, 51) for it to shuffle correctly.

推荐答案

下面是完整的概率树这些替代品。

Here's the complete probability tree for these replacements.

让我们假设你开始与序列123,然后我们将列举所有的各种方式来产生随机结果与code的问题。

Let's assume that you start with the sequence 123, and then we'll enumerate all the various ways to produce random results with the code in question.

123
 +- 123          - swap 1 and 1 (these are positions,
 |   +- 213      - swap 2 and 1  not numbers)
 |   |   +- 312  - swap 3 and 1
 |   |   +- 231  - swap 3 and 2
 |   |   +- 213  - swap 3 and 3
 |   +- 123      - swap 2 and 2
 |   |   +- 321  - swap 3 and 1
 |   |   +- 132  - swap 3 and 2
 |   |   +- 123  - swap 3 and 3
 |   +- 132      - swap 2 and 3
 |       +- 231  - swap 3 and 1
 |       +- 123  - swap 3 and 2
 |       +- 132  - swap 3 and 3
 +- 213          - swap 1 and 2
 |   +- 123      - swap 2 and 1
 |   |   +- 321  - swap 3 and 1
 |   |   +- 132  - swap 3 and 2
 |   |   +- 123  - swap 3 and 3
 |   +- 213      - swap 2 and 2
 |   |   +- 312  - swap 3 and 1
 |   |   +- 231  - swap 3 and 2
 |   |   +- 213  - swap 3 and 3
 |   +- 231      - swap 2 and 3
 |       +- 132  - swap 3 and 1
 |       +- 213  - swap 3 and 2
 |       +- 231  - swap 3 and 3
 +- 321          - swap 1 and 3
     +- 231      - swap 2 and 1
     |   +- 132  - swap 3 and 1
     |   +- 213  - swap 3 and 2
     |   +- 231  - swap 3 and 3
     +- 321      - swap 2 and 2
     |   +- 123  - swap 3 and 1
     |   +- 312  - swap 3 and 2
     |   +- 321  - swap 3 and 3
     +- 312      - swap 2 and 3
         +- 213  - swap 3 and 1
         +- 321  - swap 3 and 2
         +- 312  - swap 3 and 3

现在,数字的第四列,交换信息之前之一,包含最后的结局,有27个可能的结果。

Now, the fourth column of numbers, the one before the swap information, contains the final outcome, with 27 possible outcomes.

让我们指望有多少次,每次模式出现:

Let's count how many times each pattern occurs:

123 - 4 times
132 - 5 times
213 - 5 times
231 - 5 times
312 - 4 times
321 - 4 times
=============
     27 times total

如果您运行code,在随机掉期的无数次的图案132,213和231将发生往往比图案123,312和321,仅仅是因为这样的code互换使该发生的可能性更大。

If you run the code that swaps at random for an infinite number of times, the patterns 132, 213 and 231 will occur more often than the patterns 123, 312, and 321, simply because the way the code swaps makes that more likely to occur.

现在,当然,你可以说,如果您运行code的30倍(27 + 3),你可以用所有的模式存在的5倍,但与数据打交道,你要看看最终长期趋势。

Now, of course, you can say that if you run the code 30 times (27 + 3), you could end up with all the patterns occuring 5 times, but when dealing with statistics you have to look at the long term trend.

下面是C#code,探索随机性为每个可能的模式之一:

Here's C# code that explores the randomness for one of each possible pattern:

class Program
{
    static void Main(string[] args)
    {
        Dictionary<String, Int32> occurances = new Dictionary<String, Int32>
        {
            { "123", 0 },
            { "132", 0 },
            { "213", 0 },
            { "231", 0 },
            { "312", 0 },
            { "321", 0 }
        };

        Char[] digits = new[] { '1', '2', '3' };
        Func<Char[], Int32, Int32, Char[]> swap = delegate(Char[] input, Int32 pos1, Int32 pos2)
        {
            Char[] result = new Char[] { input[0], input[1], input[2] };
            Char temp = result[pos1];
            result[pos1] = result[pos2];
            result[pos2] = temp;
            return result;
        };

        for (Int32 index1 = 0; index1 < 3; index1++)
        {
            Char[] level1 = swap(digits, 0, index1);
            for (Int32 index2 = 0; index2 < 3; index2++)
            {
                Char[] level2 = swap(level1, 1, index2);
                for (Int32 index3 = 0; index3 < 3; index3++)
                {
                    Char[] level3 = swap(level2, 2, index3);
                    String output = new String(level3);
                    occurances[output]++;
                }
            }
        }

        foreach (var kvp in occurances)
        {
            Console.Out.WriteLine(kvp.Key + ": " + kvp.Value);
        }
    }
}

此输出:

123: 4
132: 5
213: 5
231: 5
312: 4
321: 4

所以,尽管这个答案事实上确实算,它不是一个纯粹的数学答案,你就必须评估所有可能的方式随机函数可以去,看看最后的输出。

So while this answer does in fact count, it's not a purely mathematical answer, you just have to evaluate all possible ways the random function can go, and look at the final outputs.

这篇关于为什么这个简单的洗牌算法产生偏见的结果吗?什么是一个简单的道理?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆