生成具有概率分布的随机数 [英] Generate Random Numbers with Probabilistic Distribution

查看:135
本文介绍了生成具有概率分布的随机数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好的,这是我的问题.我们正在考虑从公司购买数据集以扩充现有数据集.出于这个问题的目的,假设此数据集使用有机数字对位置进行排名(这意味着分配给一个位置的数字与分配给另一个位置的数字没有关系).技术范围是0到无穷大,但是从我所看到的样本集中,它是0到70.根据样本,它绝对不是均匀分布的(在10,000中,可能有5个地方的得分超过40, 50分(满分10分)和1000分(满分1分).在决定购买此套装之前,我们想对其进行仿真,以便我们可以了解它的用处.

Ok, so here's my problem. We are looking at purchasing a data set from a company to augment our existing data set. For the purposes of this question, let's say that this data set ranks places with an organic number (meaning that the number assigned to one place has no bearing on the number assigned to another). The technical range is 0 to infinity, but from sample sets that I've seen, it's 0 to 70. Based on the sample, it's most definitely not a uniform distribution (out of 10,000 there are maybe 5 places with a score over 40, 50 with a score over 10, and 1000 with a score over 1). Before we decide to purchase this set, we would like to simulate it so that we can see how useful it may be.

因此,为了模拟它,我一直在考虑为每个位置生成一个随机数(大约150,000个随机数).但是,我也想保持数据的精髓,并保持分布相对相同(或至少合理地接近).我整天都在绞尽脑汁,想办法去做,结果变得空虚.

So, to simulate it, I've been thinking about generating a random number for each place (about 150,000 random numbers). But, I also want to keep to the spirit of the data, and keep the distribution relatively the same (or at least reasonably close). I've been racking my brain all day trying to think of a way to do it, and have come up empty.

我曾经想过要对随机数取平方(0到sqrt(70)之间).但这将有利于少于1和更大的数字.

One thought I had was to square the random number (between 0 and sqrt(70)). But that would favor both less than 1 and larger numbers.

我认为他的实际分布应该在第一象限是双曲线的...我只是在空白如何将随机数的线性,均匀分布变成双曲线分布(如果双曲线是我想要的,首先).

I'm thinking that he real distribution should be hyperbolic in the first quadrant... I'm just blanking on how to turn a linear, even distribution of random numbers into a hyperbolic distribution (If hyperbolic is even what I want in the first place).

有什么想法吗?

总而言之,这是我想要的分布(大约):

So, to sum, here's the distribution I would like (approximately):

  • 40-70:0.02%-0.05%
  • 10-40:0.5%-1%
  • 1-10:10%-20%
  • 0-1:剩余(78.95%-89.48%)

推荐答案

查看可靠性分析中使用的分布-它们往往有这些长尾巴.一个相对简单的可能性是Weibull分布,其中P(X> x)= exp [-(x/b)^ a].

Look at distributions used in reliability analysis - they tend to have these long tails. A relatively simply possibility is the Weibull distribution with P(X>x)=exp[-(x/b)^a].

将值设置为P(X> 1)= 0.1和P(X> 10)= 0.005,我得到a = 0.36和b = 0.1.这意味着P(X> 40)* 10000 = 1.6,这有点太低了,但是P(X> 70)* 10000 = 0.2,这是合理的.

Fitting your values as P(X>1)=0.1 and P(X>10)=0.005, I get a=0.36 and b=0.1. This would imply that P(X>40)*10000=1.6, which is a bit too low, but P(X>70)*10000=0.2 which is reasonable.

编辑 哦,要从均值(0,1)值U生成Weibull分布的随机变量,只需计算b * [-log(1-u)] ^(1/a).如果我计算错误,这就是1-P(X> x)的反函数.

EDIT Oh, and to generate a Weibull-distributed random variable from a uniform(0,1) value U, just calculate b*[-log(1-u)]^(1/a). This is the inverse function of 1-P(X>x) in case I miscalculated something.

这篇关于生成具有概率分布的随机数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆