将相同项目放入匿名存储桶后的可能分布及其概率 [英] possible distributions and their probabilities after putting identical items into anonymous buckets

查看:86
本文介绍了将相同项目放入匿名存储桶后的可能分布及其概率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

很抱歉,如果在其他地方可以轻松找到答案。我的数学和统计数据很弱,因此我什至不知道我要做什么的搜索字词。 。 。



我有 b 个匿名的无法区分的存储桶,我要在这些存储桶中放入 i 个相同的商品。我想知道所有可能的分布及其概率。例如,如果我有3个存储桶和3个项目,我想要的答案是:




  • [3,0,0]-> 1 / 9

  • [2,1,0]-> 6/9

  • [1,1,1]-> 2/9



请注意,存储桶是匿名的,因此我希望像上面所做的那样合并相同的分布。例如,[2,1,0]情况实际上是[2,1,0],[0,2,1]等情况的总和。



此外,我具有最大存储桶大小的约束。例如,三个球,三个桶,bucketsize = 2应该返回:




  • [2,1,0] prob = 7/9

  • [1,1,1] prob = 2/9



可以看到在概率树上:

 将项目1插入[0,0,0]-> [1,0,0] p = 1 
将项目2插入[1,0,0]-> [2,0,0] p = 1/3或[1,1,0] 2/3
将项目3插入[2,0,0]-> [2,1,0] p = 1.0
将项目3插入[1,1,0]-> [2,1,0] p = 2/3或[1,1,1] p = 1/3

因此,状态[2,1,0]有两条路径:1 / 3 * 1 AND 2/3 * 2/3 = 7/9
所以状态[1,1,1]有一条路径:2/3 * 1/3 = 2/9

这是每个概率的另一种观点:$ b​​ $ b



不受限制的存储桶大小的算法将计算此图中的底行,而无需创建整个树。您可以对其进行调整以直接计算任何单一分布的概率(请参见下面的代码示例)。



因此,您可以先运行原始算法,然后针对每个无效分布,找到前一行中哪些分布导致无效分布,然后重新分配概率。



在图中所示的示例中,这意味着将重新分配以红色表示的概率,该概率是(4,0,0)下的子树,其中4是最大存储桶大小。



因此,通常来说,要将原始算法中发现的概率调整为 s 的受限存储区大小,从(s,0,0)开始,找到该分布的所有子代:(s + 1,0,0)和(s,1,0),计算它们的概率,计算无效分布的概率,然后在它们之间重新分布有效的分布,然后继续下一行,直到您重新计算了最后一行中所有具有值 s 的分布。



下面的代码示例基于原始算法,展示了如何直接查找具有不受限制的存储桶大小的任何分布的概率。这将是新算法的基本步骤。



 函数概率(部分){//每个存储桶的变量数组= part.reduce(function(s,n){return s + n},0); var buckets = part.length; var f = factorial(Math.max(items,buckets)); var total = Math.pow(存储桶,项目);返回permutations()* distributions()/总计;函数permutations(){var dup = 1,perms = f [buckets]; for(var i = 1; i  



以下代码示例对分区使用了不同的表示法:不是使用值是每个存储桶中的项目数的数组,而是使用具有的存储桶中的值的值索引项;因此[0,3,2,0]表示3个包含1个项目的存储桶和2个包含2个项目的存储桶,或常规表示法中的[2,2,2,1,1]。



 函数概率(部分){//每项商品的存储区数组var items = part.reduce(function(t,n,i){return t + n * i},0); var buckets = part.reduce(function(t,n){return t + n},0); var f = factorial(Math.max(items,buckets)); var total = Math.pow(存储桶,项目);返回permutations()* distributions()/总计; function permutations(){return part.reduce(function(t,n){return t / f [n]},f [buckets]); }函数distributions(){返回part.reduce(function(t,n,i){返回t / Math.pow(f [i],n)},f [items]); }函数factorial(max){var f = [1];对于(var i = 1; i< = max; i ++)f [i] = f [i-1] * i;返回f; }} document.write(probability([0,3,2,0,0,0,0,0]));  



下面是使用替代符号的完整算法的实现;它会生成有效的分布(考虑到存储桶的大小),计算它们在不受限制的存储桶大小的情况下的概率,然后重新分配无效分布的概率。



  function Dictionary(){this.pair = {}; this.add = function(array,value){var key = array.join(’,’); if(this.pair中的键)this.pair [key] + = value;否则this.pair [key] = value; } this.read = function(){for(this.pair中的var key){var prob = this.pair [key],array = key.split(’,’); array.forEach(function(v,i,a){a [i] = parseInt(v);});删除this.pair [key];返回{dist:数组,概率:概率}; }}}函数分区(项目,存储桶,大小){var dict =新字典,init = [存储桶],total = Math.pow(存储桶,项目),f = factorial();对于(var i = 0; i  






(下面是第一个想法;结果证明效率太低。)



在进行此操作时,我想到了一个完全不同的版本该算法,使用字典。它可能更接近您的原始操作,但应该更有效率。基本上,我使用了一种不同的方式来存储发行版;我以前写过的地方:


(3,2,2,1,0)


对于5个装有3、2、2、1和0项目的桶,我现在使用:


(1,1,2,1)


用于1个空桶,1个桶和1个项目,2个桶和2个个项目,以及1个包含3个项目的存储桶。该列表的长度由存储桶大小加1决定,而不再由存储桶数决定。



这样做的好处是重复项如下:


(3,2 ,2,1,0)(3,1,2,0,2)(1,2,2,3,0)(0,1,2,2,3)...


现在都具有相同的符号。



下面的代码示例使用带有新符号的字典概念。 (不幸的是,事实证明它很慢,所以我将继续研究改进它。)



  function Dictionary(){this.dict = [];} Dictionary.prototype.add = function(array,value){var key = array.join(',');如果(this.dict [key] ==未定义)this.dict [key] =值;否则this.dict [key] + = value;} Dictionary.prototype.print = function(){for(this.dict中的var i)document.write(i +& rarr; + this.dict [i] +< br>);}函数概率(项目,存储桶,大小){var dict =新字典,dist = [存储桶];对于(var i = 0; i  


Apologies if the answer to this is readily found elsewhere. My math and stats are weak and thus I don't even know the search terms for what I'm trying to do . . .

I have b anonymous indistinguishable buckets into which I'm putting i identical items. I want to know all possible distributions and their probabilities. For example, if I have 3 buckets and 3 items, the answer I want is:

  • [3,0,0] -> 1/9
  • [2,1,0] -> 6/9
  • [1,1,1] -> 2/9

Notice that the buckets are anonymous and thus I want identical distributions to be combined as I have done above. For example, the [2,1,0] case is actually the sum of the [2,1,0], [0,2,1], etc. cases.

In addition, I have the constraint of a max bucket size. For example, 3 balls, 3 buckets, bucketsize=2 should return:

  • [2,1,0] prob=7/9
  • [1,1,1] prob=2/9

This can be seen looking at a probability tree:

Insert item 1 into [0,0,0] -> [1,0,0] p=1  
Insert item 2 into [1,0,0] -> [2,0,0] p=1/3 OR [1,1,0] 2/3  
Insert item 3 into [2,0,0] -> [2,1,0] p=1.0   
Insert item 3 into [1,1,0] -> [2,1,0] p=2/3 OR [1,1,1] p=1/3  

So state [2,1,0] has two paths to it: 1/3*1 AND 2/3*2/3 = 7/9  
So state [1,1,1] has one path  to it: 2/3 * 1/3 = 2/9

Here's another view of the probabilities for each: bins_and_balls http://bents.us/Pictures/bins_balls.png

Thanks very much for reading my question and for any help you can provide. I have not cross-posted this on https://stats.stackexchange.com/ but if people think it is better there, then I'll delete this one and repost it there.

UPDATE

There is been some discussion in the comments about the correctness of some of the proposed algorithms. To help validate, I wrote the following simulator:

#! /usr/bin/env python

from __future__ import division
import random

def simulate(num_bucks,items,bsize,iterations=50000):
  perms = dict()
  for n in range(iterations):
    buckets = [0] * num_bucks 
    for i in range(items):
      while True:
        b = random.randint(0,num_bucks-1)
        if buckets[b] < bsize:
          break  # kludge, loop until we find an unfilled bucket 
      buckets[b] +=1
    buckets.sort()
    buckets = tuple(reversed(buckets))
    try:
      perms[buckets]['count'] += 1
    except KeyError:
      perms[buckets] = {'perm' : buckets, 'count' : 1}

  for perm in perms.values(): 
    perm['prob'] = perm['count'] / iterations
  return perms


def main():
  perms =    simulate(num_bucks=3,items=3,bsize=2)
  for perm in perms.values():
    print perm

if __name__ == "__main__":
    main()

Which has output for 3 buckets, 3 balls, bucketsize of 2 like:

(1, 1, 1) 0.22394
(2, 1, 0) 0.77606

解决方案

Overview

When distributing e.g. 3 items over 4 buckets, you can put the first item into any of the 4 buckets, then for each of these 4 possibilities you can put the second item into any of the 4 buckets, and then for each of these 16 possibilities you can put the third item into any of the 4 buckets, to make a total of 64 possibilities:

(3,0,0,0) (2,1,0,0) (2,0,1,0) (2,0,0,1) (2,1,0,0) (1,2,0,0) (1,1,1,0) (1,1,0,1) (2,0,1,0) (1,1,1,0) (1,0,2,0) (1,0,1,1) (2,0,0,1) (1,1,0,1) (1,0,1,1) (1,0,0,2) (2,1,0,0) (1,2,0,0) (1,1,1,0) (1,1,0,1) (1,2,0,0) (0,3,0,0) (0,2,1,0) (0,2,0,1) (1,1,1,0) (0,2,1,0) (0,1,2,0) (0,1,1,1) (1,1,0,1) (0,2,0,1) (0,1,1,1) (0,1,0,2) (2,0,1,0) (1,1,1,0) (1,0,2,0) (1,0,1,1) (1,1,1,0) (0,2,1,0) (0,1,2,0) (0,1,1,1) (1,0,2,0) (0,1,2,0) (0,0,3,0) (0,0,2,1) (1,0,1,1) (0,1,1,1) (0,0,2,1) (0,0,1,2) (2,0,0,1) (1,1,0,1) (1,0,1,1) (1,0,0,2) (1,1,0,1) (0,2,0,1) (0,1,1,1) (0,1,0,2) (1,0,1,1) (0,1,1,1) (0,0,2,1) (0,0,1,2) (1,0,0,2) (0,1,0,2) (0,0,1,2) (0,0,0,3)

You will notice that there are many duplicates in this list, e.g. (0,1,0,2) appears 3 times. That is because the 3 items are anonymous and identical, so there are several ways in which you can distribute the 3 items over the 4 buckets and arrive at the same result, e.g.:

(0,0,0,0) → (0,1,0,0) → (0,1,0,1) → (0,1,0,2)
(0,0,0,0) → (0,0,0,1) → (0,1,0,1) → (0,1,0,2)
(0,0,0,0) → (0,0,0,1) → (0,0,0,2) → (0,1,0,2)

Also, because the buckets are anonymous and identical, permutations like (2,0,1,0) and (0,1,2,0) are considered equal. Therefor, the 64 possibilities are grouped together to form these 3 classes:

(3,0,0,0) x4
(2,1,0,0) x36
(1,1,1,0) x24

The probability for each of these classes is then the number of possibilities that are part of the class, divided by the total number of possibilities:

(3,0,0,0) → 4 / 64 = 1 / 16
(2,1,0,0) → 36 / 64 = 9 / 16
(1,1,1,0) → 24 / 64 = 6 / 16

An algorithm which generates all the possibilities and then groups them together into classes is not very efficient, because the number of possibilities quickly becomes huge for higher numbers of items and buckets, as you have experienced.
It is better to generate the classes (step 1 below), calculate the number of permutations per class (step 2 below), calculate the number of duplicate distributions per permutation (step 3 below), and then combine the results to get the probability for each class (step 4 below).

Step 1: generating the partitions

First, generate the partitions (or more precisely: partitions with restricted number of parts) of i with b parts:

(3,0,0,0) (2,1,0,0) (1,1,1,0)

The basic way to generate partitions is to use a recursive algorithm, where you choose all possible values for the first part, and then recurse with the rest, e.g.:

partition(4) :
• 4
• 3 + partition(1)
• 2 + partition(2)
• 1 + partition(3)
→ (4) (3,1) (2,2) (2,1,1) (1,3) (1,2,1) (1,1,2) (1,1,1,1)

In order to avoid duplicates, only generate non-increasing sequences of values. To achieve this, you have to tell every recursion what the maximum value of its parts is, e.g.:

partition(4, max=4) :
• 4
• 3 + partition(1, max=3)
• 2 + partition(2, max=2)
• 1 + partition(3, max=1)
→ (4) (3,1) (2,2) (2,1,1) (1,1,1,1)

When partitioning with restricted number of parts, the first part should not be less than the total value divided by the number of parts, e.g. when partitioning 4 into 3 parts, there's no partition with first part 1, because 1 < 4/3:

partition(4, parts=3, max=4) :
• 4,0,0
• 3 + partition(1, parts=2, max=3)
• 2 + partition(2, parts=2, max=2)
→ (4,0,0) (3,1,0) (2,2,0) (2,1,1)

When the number of parts is 2, this is the lowest recursion level. The partitions are then generated by iterating over the values for the first part, from the maximum down to half of the value, and putting the rest in the second part, e.g.:

partition(4, parts=2, max=3) :
• 3,1
• 2,2
→ (3,1) (2,2)

Since a recursive partitioning algorithm reduces each task to a number of tasks with smaller values and fewer parts, the same partitions with small values are generated many times. That's why techniques like memoization and tabulation can speed up the algorithm significantly.

Step 2: calculating the number of permutations

Then, find the number of permutations of each partition:

(3,0,0,0) (0,3,0,0) (0,0,3,0) (0,0,0,3) → 4
(2,1,0,0) (2,0,1,0) (2,0,0,1) (1,2,0,0) (0,2,1,0) (0,2,0,1) (1,0,2,0) (0,1,2,0) (0,0,2,1) (1,0,0,2) (0,1,0,2) (0,0,1,2) → 12
(1,1,1,0) (1,1,0,1) (1,0,1,1) (0,1,1,1) → 4

You don't need to generate all these permutations to calculate how many there are; you only have to check how many duplicate values there are in every partition generated in step 1:

(3,0,0,0) (2,1,0,0) (1,1,1,0)

Take the factorial of the number of buckets b (4!) and divide it by the product of the factorial of the number of duplicate values:

(3,0,0,0) → 4! / (1! × 3!) = 4
(2,1,0,0) → 4! / (1! × 1! × 2!) = 12
(1,1,1,0) → 4! / (3! × 1!) = 4

Step 3: calculating the number of distributions

Then, for each partition, we have to find the number of ways in which you can distribute the items in the buckets, e.g. for partition (2,1,0,0):

• first item in bucket 1, second item in bucket 1, third item in bucket 2 → (2,1,0,0)
• first item in bucket 1, second item in bucket 2, third item in bucket 1 → (2,1,0,0)
• first item in bucket 2, second item in bucket 1, third item in bucket 1 → (2,1,0,0)

This number can be calculated by taking the factorial of the number of items i (3!) and dividing it by the product of the factorial of the values (ignoring the zeros, because 0! = 1):

(3,0,0,0) → 3! / 3! = 1
(2,1,0,0) → 3! / (2! × 1!) = 3
(1,1,1,0) → 3! / (1! × 1! × 1!) = 6

Step 4: calculating the probability

Then, for each partition, multiply the number of permutations and the number of distributions, and sum these to find the total number of possibilities:

4×1 + 12×3 + 4×6 = 64

Or simply use this calculation:

buckets ^ items = 4 ^ 3 = 64

And then the probability per partition is the number of permutations times the number of distributions, divided by the total number of possibilities:

(3,0,0,0) → 4×1 / 64 = 4 / 64 = 1 / 16
(2,1,0,0) → 12×3 / 64 = 36 / 64 = 9 / 16
(3,0,0,0) → 4×6 / 64 = 24 / 64 = 6 / 16

Code example

This is an example of the basic algorithm without memoization:

function partitions(items, buckets) {
    var p = [];
    partition(items, buckets, items, []);
    probability();
    return p;

    function partition(val, parts, max, prev) {
        for (var i = Math.min(val, max); i >= val / parts; i--) {
            if (parts == 2) p.push({part: prev.concat([i, val - i])});
            else partition(val - i, parts - 1, i, prev.concat([i]));
        }
    }
    function probability() {
        var total = Math.pow(buckets, items);
        for (var i = 0; i < p.length; i++) {
            p[i].prob = permutations(p[i].part) * distributions(p[i].part) / total;
        }
    }
    function permutations(partition) {
        var dup = 1, perms = factorial(buckets);
        for (var i = 1; i < buckets; i++) {
            if (partition[i] != partition[i - 1]) {
                perms /= factorial(dup);
                dup = 1;
            } else ++dup;
        }
        return perms / factorial(dup);
    }            
    function distributions(partition) {
        var dist = factorial(items);
        for (var i = 0; i < buckets; i++) {
            dist /= factorial(partition[i]);
        }
        return dist;
    }
    function factorial(n) {
        var f = 1;
        while (n > 1) f *= n--;
        return f;
    }
}

var result = partitions(3,4);
for (var i in result) document.write(JSON.stringify(result[i]) + "<br>");
var result = partitions(7,5);
for (var i in result) document.write("<br>" + JSON.stringify(result[i]));


Additional restriction: bucket size

To adapt the results given by the algorithm to the situation where there is a maximum number of items any bucket can hold, you'd need to identify the invalid ditributions, and then redistribute their probabilities among the valid distributions, e.g.:

i=5, b=4, s=5
5,0,0,0 → 1 / 256
4,1,0,0 → 15 / 256
3,2,0,0 → 30 / 256
3,1,1,0 → 60 / 256
2,2,1,0 → 90 / 256
2,1,1,1 → 60 / 256

i=5, b=4, s=4
4,1,0,0 → 16 / 256 ← (15 + 1)
3,2,0,0 → 30 / 256
3,1,1,0 → 60 / 256
2,2,1,0 → 90 / 256
2,1,1,1 → 60 / 256

i=5, b=4, s=3
3,2,0,0 → 35.333 / 256 ← (30 + 16 × 1/3)
3,1,1,0 → 70.666 / 256 ← (60 + 16 × 2/3)
2,2,1,0 → 90 / 256
2,1,1,1 → 60 / 256

i=5, b=4, s=2
2,2,1,0 → 172.444 / 256 ← (90 + 35.333 + 70.666 × 2/3)
2,1,1,1 → 83.555 / 256 ← (60 + 70.666 × 1/3)

The diagram below shows which probabilities have to be recalculated when restricting a 7-item 3-bucket distribution to a bucket size of 4.

The algorithm for unrestricted bucket size calculates the bottom row in this diagram, without having to create the whole tree. You can adapt it to directly calculate the probability of any single distribution (see code examples below).

So you could run the original algorithm first and then, for each invalid distribution, find which distributions in the previous row lead to the invalid distribution, and redistribute the probabilities.

In the example shown in the diagram, that would mean redistributing the probabilities indicated in red, which is the sub-tree under (4,0,0) where 4 is the maximum bucket size.

So in general, to adapt the probabilities found with the original algorithm to a restricted bucket size of s, you start at (s,0,0), find all children of this distribution: (s+1,0,0) and (s,1,0), calculate their probability, take the probability of the invalid distributions and redistribute them among the valid distributions, and then move on to the next row, until you've recalculated all distributions in the final row that have the value s in them.

The code example below, based on the original algorithm, shows how to directly find the probability of any distribution with unrestricted bucket size. This would be a basic step in the new algorithm.

function probability(part) { // array of items per bucket
    var items = part.reduce(function(s, n) {return s + n}, 0);
    var buckets = part.length;
    var f = factorial(Math.max(items, buckets));
    var total = Math.pow(buckets, items);
    return permutations() * distributions() / total;

    function permutations() {
        var dup = 1, perms = f[buckets];
        for (var i = 1; i < buckets; i++) {
            if (part[i] != part[i - 1]) {
                perms /= f[dup];
                dup = 1;
            } else ++dup;
        }
        return perms / f[dup];
    }            
    function distributions() {
        return part.reduce(function(t, n) {return t / f[n]}, f[items]);
    }
    function factorial(max) {
        var f = [1];
        for (var i = 1; i <= max; i++) f[i] = f[i - 1] * i;
        return f;
    }
}

document.write(probability([2,2,1,1,1]));

The following code example uses a different notation for the partitions: instead of an array where the values are the number of items per bucket, it uses values that are the number of buckets that have index items; so [0,3,2,0] means 3 buckets with 1 item and 2 buckets with 2 items, or [2,2,2,1,1] in the regular notation.

function probability(part) { // array of buckets per amount of items
    var items = part.reduce(function(t, n, i) {return t + n * i}, 0);
    var buckets = part.reduce(function(t, n) {return t + n}, 0);
    var f = factorial(Math.max(items, buckets));
    var total = Math.pow(buckets, items);
    return permutations() * distributions() / total;

    function permutations() {
        return part.reduce(function(t, n) {return t / f[n]}, f[buckets]);
    }            
    function distributions() {
        return part.reduce(function(t, n, i) {return t / Math.pow(f[i], n)}, f[items]);
    }
    function factorial(max) {
        var f = [1];
        for (var i = 1; i <= max; i++) f[i] = f[i - 1] * i;
        return f;
    }
}
document.write(probability([0,3,2,0,0,0,0,0]));

Below is an implementation of the complete algorithm using the alternative notation; it generates the valid distributions (taking into account the bucket size), calculates the probabilities they would have with unrestricted bucket size, and then redistributes the probabilities of the invalid distributions.

function Dictionary() {
    this.pair = {};
    this.add = function(array, value) {
        var key = array.join(',');
        if (key in this.pair) this.pair[key] += value;
        else this.pair[key] = value;
    }
    this.read = function() {
        for (var key in this.pair) {
            var prob = this.pair[key], array = key.split(',');
            array.forEach(function(v, i, a) {a[i] = parseInt(v);});
            delete this.pair[key];
            return {dist: array, prob: prob};
        }
    }
}
function partitions(items, buckets, size) {
    var dict = new Dictionary, init = [buckets], total = Math.pow(buckets, items), f = factorial();
    for (var i = 0; i < size; i++) init.push(0);
    partition(items, init, 0);
    if (size < items) redistribute();
    return dict;

    function partition(val, dist, min) {
        for (var i = min; i < size; i++) {
            if (dist[i]) {
                var d = dist.slice();
                --d[i]; ++d[i + 1];
                if (val - 1) partition(val - 1, d, i);
                else dict.add(d, probability(d));
            }
        }
        function probability(part) {
            return part.reduce(function(t, n, i) {return t / Math.pow(f[i], n);}, f[items])
            * part.reduce(function(t, n) {return t / f[n];}, f[buckets]) / total;
        }
    }
    function redistribute() {
        var parents = new Dictionary;
        --init[0]; ++init[size];
        parents.add(init, 0);
        for (var i = size; i < items; i++) {
            var children = new Dictionary;
            while (par = parents.read()) {
                var options = buckets - par.dist[size];
                par.prob += probability(par.dist, i) * par.dist[size] / buckets;
                for (var j = 0, b = 0; j < options; j += par.dist[b++]) {
                    while (par.dist[b] == 0) ++b;
                    var child = par.dist.slice();
                    --child[b]; ++child[b + 1];
                    if (i == items - 1) dict.add(child, par.prob * par.dist[b] / options);
                    else children.add(child, par.prob * par.dist[b] / options);
                }
            }
            parents = children;
        }
        function probability(part, items) {
            return part.reduce(function(t, n, i) {return t / Math.pow(f[i], n);}, f[items])
            * part.reduce(function(t, n) {return t / f[n];}, f[buckets]) / Math.pow(buckets, items);
        }
    }
    function factorial() {
        var f = [1], max = Math.max(items, buckets);
        for (var i = 1; i <= max; i++) f[i] = f[i - 1] * i;
        return f;
    }
}
var result = partitions(3,3,2);
for (var i in result.pair) document.write(i + " &rarr; " + result.pair[i] + "<br>");
var result = partitions(7,3,4);
for (var i in result.pair) document.write("<br>" + i + " &rarr; " + result.pair[i]);


(Below is a first idea; it turned out to be too inefficient.)

While working on this, I came up with a completely different version of the algorithm, using a dictionary. It is probably closer to what you were originally doing, but should be more efficient. Basically, I'm using a different way of storing a distribution; where I previously wrote:

(3,2,2,1,0)

for 5 buckets filled with 3, 2, 2, 1 and zero items, I now use:

(1,1,2,1)

for 1 empty bucket, 1 bucket with 1 item, 2 buckets with 2 items, and 1 bucket with 3 items. The length of this list is determined by the bucket size plus 1, and no longer by the number of buckets.

The advantage of this is that duplicates like:

(3,2,2,1,0) (3,1,2,0,2) (1,2,2,3,0) (0,1,2,2,3) ...

Now all have the same notation.

The code example below uses the dictionary idea with the new notation. (Unfortunately, it turns out to be quite slow, so I'll continue to look into improving it.)

function Dictionary() {
    this.dict = [];
}
Dictionary.prototype.add = function(array, value) {
    var key = array.join(',');
    if (this.dict[key] == undefined) this.dict[key] = value;
    else this.dict[key] += value;
}
Dictionary.prototype.print = function() {
    for (var i in this.dict) document.write(i + " &rarr; " + this.dict[i] + "<br>");
}

function probability(items, buckets, size) {
    var dict = new Dictionary, dist = [buckets];
    for (var i = 0; i < size; i++) dist.push(0);
    p(dist, 0, 1);
    return dict;

    function p(dist, value, prob) {
        var options = buckets - dist[size];
        for (var i = 0, b = 0; i < options; i += dist[b++]) {
            while (dist[b] == 0) ++b;
            var d = dist.slice();
            --d[b]; ++d[b + 1];
            if (value + 1 < items) p(d, value + 1, prob * dist[b] / options);
            else dict.add(d, prob * dist[b] / options);
        }
    }
}

probability(3, 3, 3).print();
probability(3, 3, 2).print();
probability(5, 4, 5).print();
probability(5, 4, 2).print();

这篇关于将相同项目放入匿名存储桶后的可能分布及其概率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆