创建和填充巨大的 numpy 二维数组的最快方法? [英] Fastest way to create and fill huge numpy 2D-array?

查看:72
本文介绍了创建和填充巨大的 numpy 二维数组的最快方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须在每种情况下使用来自数学公式的浮点数创建和填充巨大的(例如 96 Go,72000 行 * 72000 列)数组.数组将在之后计算.

I have to create and fill huge (e.g. 96 Go, 72000 rows * 72000 columns) array with floats in each case that come from mathematical formulas. The array will be computed after.

import itertools, operator, time, copy, os, sys
import numpy 
from multiprocessing import Pool


def f2(x):  # more complex mathematical formulas that change according to values in *i* and *x*
    temp=[]
    for i in combine:
        temp.append(0.2*x[1]*i[1]/64.23)
    return temp

def combinations_with_replacement_counts(n, r):  #provide all combinations of r balls in n boxes
   size = n + r - 1
   for indices in itertools.combinations(range(size), n-1):
       starts = [0] + [index+1 for index in indices]
       stops = indices + (size,)
       yield tuple(map(operator.sub, stops, starts))

global combine
combine = list(combinations_with_replacement_counts(3, 60))  #here putted 60 but need 350 instead
print len(combine)
if __name__ == '__main__':
    t1=time.time()
    pool = Pool()              # start worker processes
    results = [pool.apply_async(f2, (x,)) for x in combine]
    roots = [r.get() for r in results]
    print roots [0:3]
    pool.close()
    pool.join()
    print time.time()-t1

  • 创建和填充如此巨大的 numpy 数组的最快方法是什么?填充列表然后聚合然后转换为 numpy 数组?
  • 我们可以在知道案例/列/行的情况下并行化计算吗?2d-array 是独立的以加速数组的填充?使用多处理优化此类计算的线索/线索?
  • 推荐答案

    我知道您可以创建共享的 numpy 数组,这些数组可以从不同的线程进行更改(假设更改的区域不重叠).这是您可以用来执行此操作的代码草图(我在 stackoverflow 的某处看到了原始想法,请这里是 https://stackoverflow.com/a/5550156/1269140 )

    I know that you can create shared numpy arrays that can be changed from different threads (assuming that the changed areas don't overlap). Here is the sketch of the code that you can use to do that (I saw the original idea somewhere on stackoverflow, edit: here it is https://stackoverflow.com/a/5550156/1269140 )

    import multiprocessing as mp ,numpy as np, ctypes
    
    def shared_zeros(n1, n2):
        # create a 2D numpy array which can be then changed in different threads
        shared_array_base = mp.Array(ctypes.c_double, n1 * n2)
        shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())
        shared_array = shared_array.reshape(n1, n2)
        return shared_array
    
    class singleton:
        arr = None
    
    def dosomething(i):
        # do something with singleton.arr
        singleton.arr[i,:] = i
        return i
    
    def main():
        singleton.arr=shared_zeros(1000,1000)
        pool = mp.Pool(16)
        pool.map(dosomething, range(1000))
    
    if __name__=='__main__':
        main()
    

    这篇关于创建和填充巨大的 numpy 二维数组的最快方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆