在 Cython 中生成随机数的规范方法 [英] Canonical way to generate random numbers in Cython

查看:27
本文介绍了在 Cython 中生成随机数的规范方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

生成伪均匀随机数([0, 1) 中的双精度数)的最佳方法是:

What is the best way to generate pseudo uniform random numbers (a double in [0, 1)) that is:

  1. 跨平台(最好具有相同的样本序列)
  2. 线程安全(显式传递 prng 或在内部使用线程本地状态)
  3. 没有 GIL 锁
  4. 在 Cython 中很容易包装

3 年前有一个类似的帖子关于这一点,但很多答案并不符合所有标准.例如,drand48 是特定于 POSIX 的.

There was a similar post over 3 years ago about this but a lot of the answers don't meet all criteria. For example, drand48 is POSIX-specific.

我所知道的似乎(但不确定)满足所有某些标准的唯一方法是:

The only method I'm aware of, which seems (but not sure) to meet all some criteria is:

from libc.stdlib cimport rand, RAND_MAX

random = rand() / (RAND_MAX + 1.0)

注意@ogrisel asked关于 3 的相同问题多年前.

Note @ogrisel asked the same question about 3 years ago.

编辑

调用 rand 不是线程安全的.感谢您指出@DavidW.

Calling rand is not thread safe. Thanks for pointing that out @DavidW.

推荐答案

预先回答的重要警告:此答案建议使用 C++,因为该问题特别要求一种无需 GIL 即可运行的解决方案.如果您没有这个要求(而且您可能没有……),那么 Numpy 是最简单和最简单的解决方案.如果您一次生成大量数字,您会发现 Numpy 非常快.不要因为有人要求无 gil 解决方案而被误导到复杂的 C++ 包装练习中.

Big pre-answer caveat: this answer recommends using C++ because the question specifically asks for a solution that runs without the GIL. If you don't have this requirement (and you probably don't...) then Numpy is the simplest and easiest solution. Provided that you're generating large amounts of numbers at a time you will find Numpy perfectly quick. Don't be misled into a complicated exercise in wrapping C++ because someone asked for a no-gil solution.

原答案:

我认为最简单的方法是使用 C++11 标准库,它提供了 很好的封装随机数生成器和使用它们的方法.这当然不是唯一的选择,您可以包装几乎任何合适的 C/C++ 库(一个不错的选择可能是使用 numpy 使用的任何库,因为它很可能已经安装).

I think the easiest way to do this is to use the C++11 standard library which provides nice encapsulated random number generators and ways to use them. This is of course not the only options, and you could wrap pretty much any suitable C/C++ library (one good option might be to use whatever library numpy uses, since that's most likely already installed).

我的一般建议是只包装您需要的位,而不用打扰完整的层次结构和所有可选的模板参数.举例来说,我展示了一个默认生成器,输入到一个统一的浮点分布中.

My general advice is to only wrap the bits you need and not bother with the full hierarchy and all the optional template parameters. By way of example I've shown one of the default generators, fed into a uniform float distribution.

# distutils: language = c++
# distutils: extra_compile_args = -std=c++11

cdef extern from "<random>" namespace "std":
    cdef cppclass mt19937:
        mt19937() # we need to define this constructor to stack allocate classes in Cython
        mt19937(unsigned int seed) # not worrying about matching the exact int type for seed
    
    cdef cppclass uniform_real_distribution[T]:
        uniform_real_distribution()
        uniform_real_distribution(T a, T b)
        T operator()(mt19937 gen) # ignore the possibility of using other classes for "gen"
        
def test():
    cdef:
        mt19937 gen = mt19937(5)
        uniform_real_distribution[double] dist = uniform_real_distribution[double](0.0,1.0)
    return dist(gen)

(开头的 -std=c++11 用于 GCC.对于其他编译器,您可能需要对此进行调整.无论如何,越来越多的 c++11 是默认值,因此您可以删除它)

(The -std=c++11 at the start is for GCC. For other compilers you may need to tweak this. Increasingly c++11 is a default anyway, so you can drop it)

参考您的标准:

  1. 任何支持 C++ 的跨平台.我认为应该指定序列以使其可重复.
  2. 线程安全,因为状态完全存储在 mt19937 对象中(每个线程都应该有自己的 mt19937).
  3. 没有 GIL - 它是 C++,没有 Python 部分
  4. 相当简单.
  1. Cross platform on anything that supports C++. I believe the sequence should be specified so it's repeatable.
  2. Thread safe, since the state is stored entirely within the mt19937 object (each thread should have its own mt19937).
  3. No GIL - it's C++, with no Python parts
  4. Reasonably easy.


关于使用discrete_distribution.

这有点难,因为 discrete_distribution 的构造函数不太清楚如何包装(它们涉及迭代器).我认为最简单的方法是通过 C++ 向量,因为 Cython 内置了对它的支持,并且可以很容易地与 Python 列表相互转换

This is a bit harder because the constructors for discrete_distribution are less obvious how to wrap (they involve iterators). I think the easiest thing to do is to go via a C++ vector since support for that is built into Cython and it is readily convertable to/from a Python list

# use Cython's built in wrapping of std::vector
from libcpp.vector cimport vector

cdef extern from "<random>" namespace "std":
    # mt19937 as before
    
    cdef cppclass discrete_distribution[T]:
        discrete_distribution()
        # The following constructor is really a more generic template class
        # but tell Cython it only accepts vector iterators
        discrete_distribution(vector.iterator first, vector.iterator last)
        T operator()(mt19937 gen)

# an example function
def test2():
    cdef:
        mt19937 gen = mt19937(5)
        vector[double] values = [1,3,3,1] # autoconvert vector from Python list
        discrete_distribution[int] dd = discrete_distribution[int](values.begin(),values.end())
    return dd(gen)

显然这比均匀分布要复杂一些,但它也不是不可能复杂(而且讨厌的位可能隐藏在 Cython 函数中).

Obviously that's a bit more involved than the uniform distribution, but it's not impossibly complicated (and the nasty bits could be hidden inside a Cython function).

这篇关于在 Cython 中生成随机数的规范方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆