numpy二项式随机数效率低下吗? [英] Are numpy binomial random numbers inefficient?

查看:140
本文介绍了numpy二项式随机数效率低下吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在采样来自不同分布的随机数,只是意识到与其他分布相比,numpy二项式随机数有多慢.例如

I have been sampling random numbers from different distributions and just realized how slow are numpy binomial random numbers compared to other distributions. For instance

%timeit for x in range(100): np.random.binomial(100,0.5)
10000 loops, best of 3: 82.6 µs per loop
%timeit for x in range(100): np.random.uniform()
100000 loops, best of 3: 14.6 µs per loop

一个二项式数要比一个统一的数大6倍!这是可以理解的,因为二项式是离散的并且需要更复杂的变换.但是,例如,如果我要求进行n = 0或n = 1次试验的二项式,则花费的时间是相似的:

A binomial number takes 6 times more than a uniform one! This can be understandable, since binomial is discrete and requires a more complex transformation. But for instance if I ask for a binomial with a number of trials n=0 or n=1 the time spent is similar:

%timeit for x in range(100): np.random.binomial(0,0.5)
10000 loops, best of 3: 78.8 µs per loop

%timeit for x in range(100): np.random.binomial(1,0.5)
10000 loops, best of 3: 80.1 µs per loop

这似乎不是很有效,因为这些采样的结果应该是微不足道的:对于零次试验,结果应始终为零,而对于1次试验,则应为简单的伯努利试验.因此,例如,二项式的更快实现将是:

This does not seem very efficient because the result of these samplings should be trivial: For zero trials, the results should be always zero and for 1 trial it should be a simple Bernoulli trial. So for instance a faster implementation of the binomial would be:

import numpy as np

def custombinomial(n,p):
    if n == 0:
        return 0
    if n == 1:
        x = np.random.uniform()
        if x<p:
            return 1
        else:
            return 0  
    else:
        return np.random.binomial()

时间到了:

%timeit for x in range(100): custombinomial(0,0.5)
100000 loops, best of 3: 11.8 µs per loop

 %timeit for x in range(100): custombinomial(1,0.5)
10000 loops, best of 3: 31.2 µs per loop

我确信对于更大的n值,这可以得到改善.我有什么理由想念numpy这么慢吗?还有其他库可以提供更快的随机数(即使它包含某种C/Cython)吗?

I am sure this could be improved for even larger values of n. Is there any reason I am missing for numpy being so slow? Is there any other library that can give faster random numbers (even if it includes some sort of C/Cython)?

此外,此外,我知道numpy很好,如果我想同时创建一堆随机数,即获得一个二项分布的数的数组,但是在许多情况下,分布n和p的参数会改变动态,因此调用单个随机数将不是直接的选择.是否可能会生成均匀分布的随机数数组,并根据需要将其转换为特定的二项式?

Also, additionally, I know that numpy is good if I want to create a bunch of random numbers at the same time i.e. get an array of binomially distributed numbers, but in many cases the parameters of the distribution n and p will change on the fly, so the call of individual random numbers would not be directly an option. Would it be possible an alternative in which an array of uniformly distributed random numbers is generated and they are transformed in the particular binomials as they are required?.

推荐答案

Numpy的遗留二项式随机生成器为 mtrand.pyx ),例如,这样就不会利用矢量化或多线程的优势.

Numpy's legacy binomial random generator is implemented in C, and the algorithm uses numerical inversion if the parameters are sufficiently small. This may be too much work if p = 0.5, since random bits rather than random doubles could have been used instead in the binomial generator. In addition, the basic algorithm hasn't changed, it seems, for years (see also mtrand.pyx), so that it doesn't take advantage of vectorization or multithreading, for example.

此外,在Numpy成立之初,并没有太多的理由改变分配方法",因此Numpy中的这种随机生成算法和其他随机生成算法都以可再现的随机性"的名义保留下来.但是,这在1.17版和更高版本中已更改:现在允许对随机生成方法(如新的二项式随机算法)进行重大更改,但将其视为仅在"X.Y版本中引入,从不X.Y.Z".有关详细信息,请参见" RNG政策"和"随机抽样(numpy.random)".

Moreover, in the early days of Numpy there wasn't "much cause to change the distribution methods that much", so that this and other random generation algorithms in Numpy were retained in the name of reproducible "randomness". However, this has changed in version 1.17 and later: Breaking changes to random generation methods, such as a new binomial random algorithm, are now allowed, but are treated as new features that will be introduced only "on X.Y releases, never X.Y.Z". For details, see "RNG Policy" and "Random Sampling (numpy.random)".

如果更快的二项式随机数对您来说很重要,则应提交新的 Numpy问题

If having faster binomial random numbers matters to you, you should file a new Numpy issue.

这篇关于numpy二项式随机数效率低下吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆