多线程numpy nditerator [英] Multithreading a numpy nditerator
问题描述
对于MCMC实现,我想以numpy计算协方差张量C.
For an MCMC implementation, I want to calculate the covariance tensor C in numpy.
两个元素之间的距离基于其索引之间的距离.作为参考,这是工作中的单线程代码(带有示例距离):
The distance between two elements is based on the distance between their indices. For reference, here is the working single threaded code (with an example distance):
import numpy as np
#set size, dimensions, etc
size = 20
ndim = 2
shape = (size,)*ndim*2
#initialize tensor
C = np.zeros(shape)
#example distance
dist = lambda x, y: np.sqrt(np.sum((x-y)**2))
#this runs as a class method, so please forgive my sloppy coding here
def update_tensor():
it = np.nditer(C, flags=['multi_index'], op_flags=['readwrite'])
while not it.finished:
idx = np.array(it.multi_index)
it[0] = dist(idx[:idx.shape[0]//2], idx[idx.shape[0]//2:])
it.iternext()
update_tensor()
尝试解决方案
现在的问题是,将C应用于矩阵x时是多线程操作:
Solution Attempt
Now the issue is, that while applying C to a matrix x is a multithreaded operation:
x = np.random.standard_normal((size,)*ndim)
result = np.tensordot(C, x, axes=ndim)
不计算C的条目.我的想法是,在初始化后沿第一个轴拆分C并分别遍历各个块:
caculating the entries of C is not. My idea was, to split C after initialization along its first axis and iterate over the chunks separately:
import multiprocessing
def _calc_distances(C):
'Calculate distances of submatrices'
it = np.nditer(C, flags=['multi_index'], op_flags=['readwrite'])
while not it.finished:
idx = np.array(it.multi_index)
it[0] = dist(idx[:idx.shape[0]//2], idx[idx.shape[0]//2:])
it.iternext()
return C
def update_tensor(C):
'Updates Covariance Operator'
#Multicore Processing
n_processes = multiprocessing.cpu_count()
Chunks = [
C[i*C.shape[0]//n_processes:(i+1)*C.shape[0]//n_processes] for i in range(0, n_processes-1)
]
Chunks.append(C[C.shape[0]//n_processes*(n_processes-1):])
with multiprocessing.Pool(n_processes+1) as p:
#map and stitch together
C = np.concatenate(
p.map(_calc_distances, Chunks)
)
但这失败了,因为子矩阵的索引改变了.
But this fails, because the indeces of the submatrices change.
是否有更好的解决方案?如何解决索引问题?最好的方法可能就是使用共享C数据的线程对数组的各个部分进行迭代.这可能吗?
Is there a nicer solution to this? How do I fix the index issue? Probably the nicest way would be to just iterate over parts of the array with threads sharing the data of C. Is that possible?
问:您必须使用numpy迭代器吗?答:不,很好,但是我可以放弃.
Q: Do you have to use a numpy iterator? A: No, it’s nice, but I can give up on that.
推荐答案
像这样工作.只是要在这里发布课程.
Worked like this. Just going to post the class here.
CPU: Intel Core i5-6300U@2.5GHz, boosting to ~2.9GHz
Windows 10 64-bit, Python 3.7.4, Numpy 1.17
Pro:更少的计算时间缺点:使用更多的RAM;有点复杂的代码.
Pro: Less compute time Con: Uses a little more RAM; somewhat complicated code.
import multiprocessing
import numpy as np
class CovOp(object):
'F[0,1]^ndim->C[0,1]^ndim'
def f(self, r):
return np.exp(-r/self.ro)#(1 + np.sqrt(3)*r / self.ro) * np.exp(-np.sqrt(3) * r / self.ro)
def dist(self, x,y):
return np.sum((x-y)**2)
def __init__(self, ndim, size, sigma=1, ro=1):
self.tensor_cached = False
self.inverse_cached = False
self.ndim = ndim
self.size = size
self.shape = (size,)*ndim*2
self.C = np.zeros(self.shape)
self.Inv = np.zeros(self.shape)
self.ro = ro * size
self.sigma = sigma
def __call__(self, x):
if not self.tensor_cached:
self.update_tensor
if self.ndim == 0:
return self.sigma * self.C * x
elif self.ndim == 1:
return self.sigma * np.dot(self.C, x)
return self.sigma * np.tensordot(self.C, x, axes=self.ndim)
def _calc_distances(self, Chunk:tuple):
'Calculate distances of submatrices'
C, offset = Chunk
it = np.nditer(C, flags=['multi_index'], op_flags=['readwrite'])
while not it.finished:
idx = np.array(it.multi_index)
idx[0]+=offset
d = self.dist(idx[:idx.shape[0]//2], idx[idx.shape[0]//2:])
it[0] = self.f(d)
it.iternext()
return C
def update_tensor(self):
'Updates Covariance Operator'
#Multicore Processing
n_processes = multiprocessing.cpu_count()
Chunks = [
(
self.C[i*self.C.shape[0]//n_processes:(i+1)*self.C.shape[0]//n_processes],
i*self.C.shape[0]//n_processes) for i in range(0, n_processes-1)
]
Chunks.append((
self.C[self.C.shape[0]//n_processes*(n_processes-1):],
self.C.shape[0]//n_processes*(n_processes-1)
)
)
with multiprocessing.Pool(n_processes+1) as p:
self.C = np.concatenate(
p.map(self._calc_distances, Chunks)
)
self.tensor_cached = True
#missing cholesky decomposition
def update_inverse(self):
if self.ndim==1:
self.Inv = np.linalg.inv(self.C)
elif self.ndim>1:
self.Inv = np.linalg.tensorinv(self.C)
else:
self.Inv = 1/self.C
self.inverse_cached = True
def inv(self, x):
if self.ndim == 0:
return self.Inv * x / self.sigma
elif self.ndim == 1:
return np.dot(self.Inv, x) / self.sigma
return np.tensordot(self.Inv, x) / self.sigma
if __name__=='__main__':
size = 30
ndim = 2
depth = 1
Cov = CovOp(ndim, size, 1, .2)
import time
n_tests = 5
t_start = time.perf_counter()
for i in range(n_tests):
Cov.update_tensor()
t_stop = time.perf_counter()
dt_new = t_stop - t_start
print(
'''Benchmark; NDim: %s, Size: %s NTests: %s
Mean time per test:
Multithreaded %ss'''%(ndim, size, n_tests, dt_new/n_tests)
)
这篇关于多线程numpy nditerator的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!