多线程numpy nditerator [英] Multithreading a numpy nditerator

查看：45 发布时间：2021/5/18 18:40:16 python multithreading numpy iterator multiprocessing

本文介绍了多线程numpy nditerator的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

对于MCMC实现，我想以numpy计算协方差张量C.

For an MCMC implementation, I want to calculate the covariance tensor C in numpy.

两个元素之间的距离基于其索引之间的距离.作为参考，这是工作中的单线程代码(带有示例距离):

The distance between two elements is based on the distance between their indices. For reference, here is the working single threaded code (with an example distance):

import numpy as np

#set size, dimensions, etc
size = 20
ndim = 2
shape = (size,)*ndim*2

#initialize tensor
C = np.zeros(shape)
#example distance
dist = lambda x, y: np.sqrt(np.sum((x-y)**2))

#this runs as a class method, so please forgive my sloppy coding here
def update_tensor():
    it = np.nditer(C, flags=['multi_index'], op_flags=['readwrite'])
    while not it.finished:
        idx = np.array(it.multi_index)
        it[0] = dist(idx[:idx.shape[0]//2], idx[idx.shape[0]//2:])
        it.iternext()

update_tensor()

尝试解决方案

现在的问题是，将C应用于矩阵x时是多线程操作:

Solution Attempt

Now the issue is, that while applying C to a matrix x is a multithreaded operation:

x = np.random.standard_normal((size,)*ndim)
result = np.tensordot(C, x, axes=ndim)

不计算C的条目.我的想法是，在初始化后沿第一个轴拆分C并分别遍历各个块:

caculating the entries of C is not. My idea was, to split C after initialization along its first axis and iterate over the chunks separately:

import multiprocessing
def _calc_distances(C):
    'Calculate distances of submatrices'
    it = np.nditer(C, flags=['multi_index'], op_flags=['readwrite'])
    while not it.finished:
        idx = np.array(it.multi_index)
        it[0] = dist(idx[:idx.shape[0]//2], idx[idx.shape[0]//2:])
        it.iternext()
    return C

def update_tensor(C):
    'Updates Covariance Operator'   
    #Multicore Processing
    n_processes = multiprocessing.cpu_count()
    Chunks = [
        C[i*C.shape[0]//n_processes:(i+1)*C.shape[0]//n_processes] for i in range(0, n_processes-1)
    ]
    Chunks.append(C[C.shape[0]//n_processes*(n_processes-1):])
    with multiprocessing.Pool(n_processes+1) as p:
        #map and stitch together
        C = np.concatenate(
            p.map(_calc_distances, Chunks)
        )

但这失败了，因为子矩阵的索引改变了.

But this fails, because the indeces of the submatrices change.

是否有更好的解决方案?如何解决索引问题?最好的方法可能就是使用共享C数据的线程对数组的各个部分进行迭代.这可能吗?

Is there a nicer solution to this? How do I fix the index issue? Probably the nicest way would be to just iterate over parts of the array with threads sharing the data of C. Is that possible?

问:您必须使用numpy迭代器吗?答:不，很好，但是我可以放弃.

Q: Do you have to use a numpy iterator? A: No, it’s nice, but I can give up on that.

推荐答案

像这样工作.只是要在这里发布课程.

Worked like this. Just going to post the class here.

CPU: Intel Core i5-6300U@2.5GHz, boosting to ~2.9GHz
Windows 10 64-bit, Python 3.7.4, Numpy 1.17

Pro:更少的计算时间缺点:使用更多的RAM；有点复杂的代码.

Pro: Less compute time Con: Uses a little more RAM; somewhat complicated code.

import multiprocessing
import numpy as np

class CovOp(object):
    'F[0,1]^ndim->C[0,1]^ndim'
    def f(self, r):
        return np.exp(-r/self.ro)#(1 + np.sqrt(3)*r / self.ro) * np.exp(-np.sqrt(3) * r / self.ro)

    def dist(self, x,y):
        return np.sum((x-y)**2)

    def __init__(self, ndim, size, sigma=1, ro=1):
        self.tensor_cached = False
        self.inverse_cached = False
        self.ndim = ndim
        self.size = size
        self.shape = (size,)*ndim*2
        self.C = np.zeros(self.shape)
        self.Inv = np.zeros(self.shape)
        self.ro = ro * size
        self.sigma = sigma      

    def __call__(self, x):
        if not self.tensor_cached:
            self.update_tensor
        if self.ndim == 0:
            return self.sigma * self.C * x
        elif self.ndim == 1:
            return self.sigma * np.dot(self.C, x)
        return self.sigma * np.tensordot(self.C, x, axes=self.ndim)

    def _calc_distances(self, Chunk:tuple):
        'Calculate distances of submatrices'
        C, offset = Chunk
        it = np.nditer(C, flags=['multi_index'], op_flags=['readwrite'])
        while not it.finished:
            idx = np.array(it.multi_index)
            idx[0]+=offset
            d = self.dist(idx[:idx.shape[0]//2], idx[idx.shape[0]//2:])
            it[0] = self.f(d)
            it.iternext()
        return C

    def update_tensor(self):
        'Updates Covariance Operator'   
        #Multicore Processing
        n_processes = multiprocessing.cpu_count()
        Chunks = [
            (
                self.C[i*self.C.shape[0]//n_processes:(i+1)*self.C.shape[0]//n_processes],
                i*self.C.shape[0]//n_processes) for i in range(0, n_processes-1)
        ]
        Chunks.append((
                self.C[self.C.shape[0]//n_processes*(n_processes-1):],
                self.C.shape[0]//n_processes*(n_processes-1)
            )
        )
        with multiprocessing.Pool(n_processes+1) as p:
            self.C = np.concatenate(
                p.map(self._calc_distances, Chunks)
            )      
        self.tensor_cached = True
        #missing cholesky decomposition

    def update_inverse(self):
        if self.ndim==1:
            self.Inv = np.linalg.inv(self.C)
        elif self.ndim>1:
            self.Inv = np.linalg.tensorinv(self.C)
        else:
            self.Inv = 1/self.C
        self.inverse_cached = True

    def inv(self, x):
        if self.ndim == 0:
            return self.Inv * x / self.sigma
        elif self.ndim == 1:
            return np.dot(self.Inv, x) / self.sigma
        return np.tensordot(self.Inv, x) / self.sigma
if __name__=='__main__':

        size = 30
        ndim = 2
        depth = 1

        Cov = CovOp(ndim, size, 1, .2)


        import time

        n_tests = 5
        t_start = time.perf_counter()
        for i in range(n_tests):
            Cov.update_tensor()
        t_stop = time.perf_counter()
        dt_new = t_stop - t_start

        print(
        '''Benchmark; NDim: %s, Size: %s NTests: %s
        Mean time per test:
            Multithreaded %ss'''%(ndim, size, n_tests, dt_new/n_tests)
        )

这篇关于多线程numpy nditerator的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

多线程numpy nditerator [英] Multithreading a numpy nditerator

问题描述

尝试解决方案

Solution Attempt

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

多线程numpy nditerator [英] Multithreading a numpy nditerator

问题描述

尝试解决方案

Solution Attempt

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭