numpy.random.choice的性能 [英] Performance of numpy.random.choice

查看:313
本文介绍了numpy.random.choice的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我更新了代码和时间.

I updated the code and the timings.

我正在尝试提高代码中函数的性能.我必须生成一个包含随机元素的列表.但是,列表的不同部分必须填充来自不同集合的元素.代码示例如下.我必须一次生成数百万个这样的列表.

I'm trying to improve the performance of a function in my code. I must generate a list with random elements. However, different parts of the list must be filled with elements taken from different sets. An example of the code is below. I must generate millions of lists like those, one at a time.

功能foo1是最快的,但是它不能满足我的需要.它是那里的性能参考.函数foo2和foo3可以满足我的需要,但是花费的处理时间几乎是foo1的三倍.

Function foo1 is the fastest, but it does not do what I need. It is there for performance reference. Functions foo2 and foo3 do what I need, but spend almost three times the processing time of foo1.

Python 2.7.9(默认值,2015年2月10日,03:29:19). darwin上的[GCC 4.2.1兼容Apple LLVM 6.0(clang-600.0.56)]. numpy.版本 '1.8.1'

Python 2.7.9 (default, Feb 10 2015, 03:29:19). [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.56)] on darwin. numpy.version '1.8.1'

import numpy

import timeit

_ops_1 = ["-123.456", "3.1416", "1", "2"]
_ops_2 = ["ABC", "XYZ", 'A', 'B', 'C']

size = 10

def foo1 (): 
    return numpy.random.choice(_ops_1 + _ops_2, 5*size)

def foo2 (): 
    return list(numpy.concatenate((numpy.random.choice(_ops_1, 2*size), 
        numpy.random.choice(_ops_1 + _ops_2, size),
        numpy.random.choice(_ops_2, 2*size)), 0))

def foo3 (): 
    return numpy.random.choice(_ops_1, 2*size).tolist() + \
        numpy.random.choice(_ops_1 + _ops_2, size).tolist() + \
        numpy.random.choice(_ops_2, 2*size).tolist()

### Suggested by Divakar
def random_choice_replace_True(arr,size):
    return numpy.take(arr,numpy.random.randint(0,len(arr),size))

def foo4 (): 
    return random_choice_replace_True(_ops_1, 2*size).tolist() + \
        random_choice_replace_True(_ops_1 + _ops_2, size).tolist() + \
        random_choice_replace_True(_ops_2, 2*size).tolist()

### 2nd suggestion by Divakar
def random_choice_replace_True_idx(arr,size):
    return numpy.array(arr)[numpy.random.randint(0,len(arr),size)]

def foo5 (): 
    return random_choice_replace_True_idx(_ops_1, 2*size).tolist() + \
        random_choice_replace_True_idx(_ops_1 + _ops_2, size).tolist() + \
        random_choice_replace_True_idx(_ops_2, 2*size).tolist()

###########

setup = '''import numpy

_ops_1 = ["-123.456", "3.1416", "1", "2"]
_ops_2 = ["ABC", "XYZ", 'A', 'B', 'C']

size = 10'''

# As required, Number was increased to 10 million to get closer to actual timings
timeit.timeit(foo1, setup=setup, number=10000000)

timeit.timeit(foo2, setup=setup, number=10000000)

timeit.timeit(foo3, setup=setup, number=10000000)

timeit.timeit(foo4, setup=setup, number=10000000)

timeit.timeit(foo5, setup=setup, number=10000000)

我的机器上的运行时间是:

The running times on my machine were:

timeit.timeit(foo1,setup = setup,number = 10000000) 235.22050380706787

timeit.timeit(foo1, setup=setup, number=10000000) 235.22050380706787

timeit.timeit(foo2,setup = setup,number = 10000000) 760.1884841918945

timeit.timeit(foo2, setup=setup, number=10000000) 760.1884841918945

timeit.timeit(foo3,setup = setup,number = 10000000) 560.77258586883545

timeit.timeit(foo3, setup=setup, number=10000000) 560.77258586883545

timeit.timeit(foo4,setup = setup,number = 10000000) 388.695502281188188

timeit.timeit(foo4, setup=setup, number=10000000) 388.69550228118896

timeit.timeit(foo5,setup = setup,number = 10000000) 252.32089233398438

timeit.timeit(foo5, setup=setup, number=10000000) 252.32089233398438

好吧,现在我将接受Divakar提出的第二条建议,这是相当不错的.但是欢迎其他建议!

Well, for now I'll take the 2nd suggestion made by Divakar, which is pretty good. But other suggestions are welcome!

推荐答案

That np.random.choice with its optional argument replace being set as True returns randomly chosen elements from the input array and the elements could be repeated. We can simulate such a behavior by creating random indices covering the length of the array and indexing into the array for the selection. Thus, we can simulate that built-in with something like this -

def random_choice_replace_True(A,size):
    return np.array(A)[np.random.randint(0,len(A),size)]

如果要处理的输入已经是NumPy数组,则可以跳过np.array(A)部分进行转换,而只需在其中使用A.

If you are dealing with inputs that are already NumPy arrays, you can skip the np.array(A) part for conversion and simply use A there.

这篇关于numpy.random.choice的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆