一个笛卡尔积函数,可以产生大数组的结果块 [英] A Cartesian product function that can yield chunks of result for large arrays

查看:90
本文介绍了一个笛卡尔积函数,可以产生大数组的结果块的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

@Paul Panzer 分享了一个关于如何高效执行 NumPy 数组列表的笛卡尔积的出色答案.我修改了他的 cartesian_product_transpose_pp(arrays) 函数以显示迭代过程从返回数组的左列到右列发生.

@Paul Panzer shared an excellent answer on how to perform the cartesian product of a list of NumPy arrays efficiently. I have modified his cartesian_product_transpose_pp(arrays) function to show the iteration process occurs from the left to right column of the returned array.

import numpy
import itertools
import time

def cartesian_product_transpose_pp(arrays):
    la = len(arrays)
    dtype = numpy.result_type(*arrays)
    arr = numpy.empty((la, *map(len, arrays)), dtype=dtype)
    idx = slice(None), *itertools.repeat(None, la)
    for i, a in enumerate(arrays):
        arr[i, ...] = a[idx[:i]] #my modification   
    return arr.reshape(la, -1).T

mumax = 18
mumin = 1
nsample = 8
mu_list = [ i for i in range(mumin, mumax+1, 1) ]
mu_array = np.array( mu_list, dtype=np.uint8 )
mu_alist = [ mu_array ] * nsample 

start = time.time()
cartesian_product_transpose_pp( mu_alist  )
end = time.time()
print( f'\ncartesian_product_transpose_pp Time: {(end - start)}sec' )

然而,当这个函数的参数(即arrays)超过一定的大小时,它会需要一个非常大的arr并且由于MemoryError.示例:

However, when this function's argument( i.e. arrays) exceeds a certain size, it will require a very large arr and fail due to MemoryError. Example:

arr = np.empty( ( la, *map(len, arrays) ), dtype=dtype )
MemoryError: Unable to allocate 82.1 GiB for an array with shape (8, 18, 18, 18, 18, 18, 18, 18, 18) and data type uint8

为了解决这个内存错误,我想将 arr 分解成更小的块,以便能够产生更小的 arr.reshape(la, -1).Tnsample 的值增加时,我该怎么做?

To address this memory error, I would like to break arr into smaller chunks so as to be able to yield smaller chunks of arr.reshape(la, -1).T How do I do this when the value of nsample increases?

更新了我现在使用的测试代码:

import numpy as np
import itertools
import time
import sys

def cartesian_product_transpose_pp( arrays):
    la = len(arrays)
    dtype = np.result_type(*arrays)
    arr = np.empty((la, *map(len, arrays)), dtype=dtype)
    idx = slice(None), *itertools.repeat(None, la)
    for i, a in enumerate(arrays):
        arr[i, ...] = a[idx[:i]] 
    return arr.reshape(la, -1).T

mumax = 18
mumin = 1
nsample = 9 
mu_list = [ i for i in range(mumin, mumax+1, 1) ]
mu_array = np.array( mu_list, dtype=np.uint8 )
mu_alist = [ mu_array ] * nsample

a = mu_alist
start = time.time()
c = 1
result = (
    cartesian_product_transpose_pp( [ *x[:,None], *a[c:] ] )
    for x in cartesian_product_transpose_pp( a[:c] )
    )
with np.printoptions(threshold=sys.maxsize):
    for n, i in enumerate( result ):
        #print( n, i ) #for debugging
        a = i
end = time.time()
print( f'\ncartesian_product_transpose_pp Time: {(end - start)}' )

错误消息:

    arr = np.empty((la, *map(len, arrays)), dtype=dtype)
MemoryError: Unable to allocate 92.4 GiB for an array with shape (9, 1, 18, 18, 18, 18, 18, 18, 18, 18) and data type uint8

推荐答案

我也曾经解决过这个问题,但这需要 numba 速度快.这是我在类似线程中的回答(最初仅限于常规笛卡尔积).

I also had to solve this problem, but this requires numba to be fast. Here's my answer in a similar thread (originaly limited to a regular cartesian product).

这篇关于一个笛卡尔积函数,可以产生大数组的结果块的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆