生成复制任意分布的随机数 [英] Generate random numbers replicating arbitrary distribution

查看:72
本文介绍了生成复制任意分布的随机数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有数据,其中我有一个变量z,其中包含大约4000个值(从0.0到1.0),其直方图看起来像这样.

I have data wherein I have a variable z that contains around 4000 values (from 0.0 to 1.0) for which the histogram looks like this.

现在,我需要生成一个随机变量,将其命名为random_z,它应该复制上述分布.

Now I need to generate a random variable, call it random_z which should replicate the above distribution.

到目前为止,我尝试生成一个以1.0为中心的正态分布,这样我就可以删除所有高于1.0的正态分布,以获得相似的分布.我一直在使用numpy.random.normal,但是问题是我无法将范围设置为0.0到1.0,因为通常正态分布的均值= 0.0且std dev = 1.0.

What I have tried so far is to generate a normal distribution centered at 1.0 so that I can remove all those above 1.0 to get a distribution that will be similar. I have been using numpy.random.normal but the problem is that I cannot set the range from 0.0 to 1.0, because usually normal distribution has a mean = 0.0 and std dev = 1.0.

是否还有另一种方法可以在Python中生成此发行版?

Is there another way to go about generating this distribution in Python?

推荐答案

如果要进行引导,可以在观察到的序列上使用random.choice().

If you want to bootstrap you could use random.choice() on your observed series.

在这里,我假设您要平滑得多,并且您不必担心生成新的极值.

Here I'll assume you'd like to smooth a bit more than that and you aren't concerned with generating new extreme values.

使用pandas.Series.quantile()和统一的[0,1]随机数生成器,如下所示.

Use pandas.Series.quantile() and a uniform [0,1] random number generator, as follows.

培训

  • 将您的随机样本放入pandas系列中,称为该系列S

生产

  1. 以通常的方式生成介于0.0和1.0之间的随机数u,例如, random.random()
  2. 返回S.quantile(u)
  1. Generate a random number u between 0.0 and 1.0 the usual way, e.g., random.random()
  2. return S.quantile(u)

如果您更愿意使用numpy而不是pandas,那么从快速阅读中看,您可以替换

If you'd rather use numpy than pandas, from a quick reading it looks like you can substitute numpy.percentile() in step 2.

操作原理:

根据样本S,pandas.series.quantile()numpy.percentile()用于计算

From the sample S, pandas.series.quantile() or numpy.percentile() is used to calculate the inverse cumulative distribution function for the method of Inverse transform sampling. The quantile or percentile function (relative to S) transforms a uniform [0,1] pseudo random number to a pseudo random number having the range and distribution of the sample S.

如果您需要最小化编码并且不想编写和使用仅返回单个实现的函数,那么numpy.percentile最好是pandas.Series.quantile.

If you need to minimize coding and don't want to write and use functions that only returns a single realization, then it seems numpy.percentile bests pandas.Series.quantile.

让S是一个预先存在的样本.

Let S be a pre-existing sample.

u将是新的统一随机数

u will be the new uniform random numbers

newR将是从类似S的分布中提取的新随机数.

newR will be the new randoms drawn from a S-like distribution.

>>> import numpy as np

我需要将要复制的随机数样本放入S中.

I need a sample of the kind of random numbers to be duplicated to put in S.

出于创建示例的目的,我将一些统一的[0,1]随机数提高到三次方,并将其称为样本S.通过选择以这种方式生成示例样本,我将预先知道-从均值等于从0到1求出的(x ^ 3)(dx)的确定积分-S的均值应为1/(3+1) = 1/4 = 0.25

For the purposes of creating an example, I am going to raise some uniform [0,1] random numbers to the third power and call that the sample S. By choosing to generate the example sample in this way, I will know in advance -- from the mean being equal to the definite integral of (x^3)(dx) evaluated from 0 to 1 -- that the mean of S should be 1/(3+1) = 1/4 = 0.25

在您的应用程序中,您需要执行其他操作,例如读取文件,以执行以下操作: 创建一个numpy数组S,其中包含要复制其分布的数据样本.

In your application, you would need to do something else instead, perhaps read a file, to create a numpy array S containing the data sample whose distribution is to be duplicated.

>>> S = pow(np.random.random(1000),3)  # S will be 1000 samples of a power distribution

在这里,我将检查S的均值是否如上所述为0.25.

Here I will check that the mean of S is 0.25 as stated above.

>>> S.mean()
0.25296623781420458 # OK

获取最小值和最大值只是为了展示np.percentile的工作原理

get the min and max just to show how np.percentile works

>>> S.min()
6.1091277680105382e-10
>>> S.max()
0.99608676594692624

numpy.percentile函数将0-100映射到S的范围.

The numpy.percentile function maps 0-100 to the range of S.

>>> np.percentile(S,0)  # this should match the min of S
6.1091277680105382e-10 # and it does

>>> np.percentile(S,100) # this should match the max of S
0.99608676594692624 # and it does

>>> np.percentile(S,[0,100])  # this should send back an array with both min, max
[6.1091277680105382e-10, 0.99608676594692624]  # and it does

>>> np.percentile(S,np.array([0,100])) # but this doesn't.... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/dist-packages/numpy/lib/function_base.py", line 2803, in percentile
    if q == 0:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

如果我们从制服开始生成100个新值,那么效果就不那么好了

This isn't so great if we generate 100 new values, starting with uniforms:

>>> u = np.random.random(100)

因为它会出错,并且u的小数位数为0-1,并且需要0-100.

because it will error out, and the scale of u is 0-1, and 0-100 is needed.

这将起作用:

>>> newR = np.percentile(S, (100*u).tolist()) 

它可以正常工作,但是如果您想返回一个numpy数组,则可能需要调整其类型

which works fine but might need its type adjusted if you want a numpy array back

>>> type(newR)
<type 'list'>

>>> newR = np.array(newR)

现在我们有一个numpy数组.让我们检查新随机值的均值.

Now we have a numpy array. Let's check the mean of the new random values.

>>> newR.mean()
0.25549728059744525 # close enough

这篇关于生成复制任意分布的随机数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆