具有频繁变化的概率的c ++离散分布采样 [英] c++ discrete distribution sampling with frequently changing probabilities

查看:125
本文介绍了具有频繁变化的概率的c ++离散分布采样的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题:我需要从由某些权重构成的离散分布中抽样,例如{w1,w2,w3,..},因此是概率分布{p1,p2,p3,...},其中pi = wi /(w1 + w2 + ...)。



一些wi的变化非常频繁,但仅占所有wi的很小一部分。但是,分发本身因此每次发生时都必须重新规范化,因此,我相信Alias方法不能有效地工作,因为每次都需要从头开始构建整个分发。



我目前正在考虑的方法是二叉树(堆方法),其中所有wi都保存在最低级别中,然后将每两项的总和保存在更高级别中,依此类推。它们全部的总和将处于最高水平,这也是归一化常数。因此,为了在wi更改后更新树,需要进行log(n)更改,并且需要从分布中获取相同数量的样本。





第一季度。您对如何更快地实现它有更好的想法吗?
Q2。最重要的部分:我正在寻找一个已经做到这一点的库。



说明:几年前,我通过在向量中构建堆结构来完成此工作,但是从那时起,我学到了很多东西,包括发现库(:)以及地图之类的容器...现在,我需要以更高的功能重写该代码,而这次我想做到这一点:



所以Q2.1有一种不错的方法可以使c ++映射不是按索引排序,而是按元素的累加和进行搜索(这是我们采样的方式,对吗? )。 (这是我目前的理论,我想怎么做,但不一定非要这样...)



Q2.2也许还有一些更好的方法一样的方法?我相信这个问题如此频繁,以至于我找不到能为我做的图书馆...



非常感谢,非常抱歉,如果有人以其他形式提出了此要求,请直接告诉我,但是我花了很多时间在寻找...



-z



编辑:有可能我也需要删除或添加元素,但我认为我可以避免,如果那样的话



Edit2:权重通常是实数,我必须考虑是否可以将它们设置为整数。 ..

解决方案

我实际上会使用一组哈希字符串(不记得它的C ++容器,您可能需要实施自己的)。为每个i放入wi元素,其值分别为 w1_1, w1_2,...到 w1_ [w1](即,以 w1_开头的w1元素)。



当需要采样时,请使用均匀分布随机选择一个元素。如果您选择了w5_ *,则说您选择了元素5。由于哈希中的元素数量众多,这将为您提供所需的分布。



现在,当wi从A变为B时,只需将BA元素添加到哈希中(如果B> A),或删除wi的最后AB元素(如果A> B)。



在这种情况下,添加新元素和删除旧元素是微不足道的。



显然,问题是随机选择一个元素。如果您的哈希是封闭式哈希,则可以随机选择一个数组单元,如果它为空,则只需再次随机选择一个即可。如果您保留的哈希值比权重总和大3或4倍,那么您的复杂度就会非常好:O(1)用于检索随机样本,O(| AB |)用于修改权重。



另一种选择是,由于权重只有一小部分发生变化,因此将权重分为两部分-固定部分和已更改部分。然后,您只需要担心已更改零件的更改以及已更改零件的总重量与未更改零件的总重量之间的差异。然后对于固定部分,您的哈希变成一个简单的数字数组:1出现w1次,2出现w2次,依此类推...,选择随机的固定元素就是选择一个随机数。


Problem: I need to sample from a discrete distribution constructed of certain weights e.g. {w1,w2,w3,..}, and thus probability distribution {p1,p2,p3,...}, where pi=wi/(w1+w2+...).

some of wi's change very frequently, but only a very low proportion of all wi's. But the distribution itself thus has to be renormalised every time it happens, and therefore I believe Alias method does not work efficiently because one would need to build the whole distribution from scratch every time.

The method I am currently thinking is a binary tree (heap method), where all wi's are saved in the lowest level, and then the sum of each two in higher level and so on. The sum of all of them will be in the highest level, which is also a normalisation constant. Thus in order to update the tree after change in wi, one needs to do log(n) changes, as well as the same amount to get the sample from the distribution.

Question:

Q1. Do you have a better idea on how to achieve it faster? Q2. The most important part: I am looking for a library which has already done this.

explanation: I have done this myself several years ago, by building heap structure in a vector, but since then I have learned many things including discovering libraries ( :) ), and containers such as map... Now I need to rewrite that code with higher functionality, and I want to make it right this time:

so Q2.1 is there a nice way to make a c++ map ordered and searched not by index, but by a cumulative sum of it's elements (this is how we sample, right?..). (that is my current theory how I would like to do it, but it doesnt have to be this way...)

Q2.2 Maybe there is some even nicer way to do the same? I would believe this problem is so frequent that I am very surprised I could not find some sort of library which would do it for me...

Thank you very much, and I am very sorry if this has been asked in some other form, please direct me towards it, but I have spent a good while looking...

-z

Edit: There is a possibility that I might need to remove or add the elements as well, but I think I could avoid it, if that makes a huge difference, thus leaving only changing the value of the weights.

Edit2: weights are reals in general, I would have to think if I could make them integers...

解决方案

I would actually use a hash set of strings (don't remember the C++ container for it, you might need to implement your own though). Put wi elements for each i, with the values "w1_1", "w1_2",... all through "w1_[w1]" (that is, w1 elements starting with "w1_").

When you need to sample, pick an element at random using a uniform distribution. If you picked w5_*, say you picked element 5. Because of the number of elements in the hash, this will give you the distribution you were looking for.

Now, when wi changes from A to B, just add B-A elements to the hash (if B>A), or remove the last A-B elements of wi (if A>B).

Adding new elements and removing old elements is trivial in this case.

Obviously the problem is 'pick an element at random'. If your hash is a closed hash, you pick an array cell at random, if it's empty - just pick one at random again. If you keep your hash 3 or 4 times larger than the total sum of weights, your complexity will be pretty good: O(1) for retrieving a random sample, O(|A-B|) for modifying the weights.

Another option, since only a small part of your weights change, is to split the weights into two - the fixed part and the changed part. Then you only need to worry about changes in the changed part, and the difference between the total weight of changed parts and the total weight of unchanged parts. Then for the fixed part your hash becomes a simple array of numbers: 1 appears w1 times, 2 appears w2 times, etc..., and picking a random fixed element is just picking a random number.

这篇关于具有频繁变化的概率的c ++离散分布采样的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆