仿真中的多线程与列表理解 [英] multithread vs list comprehension in simulation
问题描述
假设我们有一个递归关系
Suppose we have a recurrence relation
A[0] = a
A[i+1] = f(A[i],c)
其中c是参数,而f
是某些函数,例如
where c is a parameter and f
is some function, say
def f(x, c):
return sin(x) + c
假定对于给定的a
,我们要为c in Cs = [c[j] for j in range(0, m)]
评估A[i] for i in range(0,n)
,其中n
和m
相当大.
Assume that, for a given a
, we want to evaluate A[i] for i in range(0,n)
for c in Cs = [c[j] for j in range(0, m)]
, where n
and m
are fairly large.
几个问题:我应该为每个c使用多线程还是为每个i使用列表推导.
Question in a few words: Should I use multithreading for each c or list comprehension for each i.
让我解释一下.我正在考虑以下两种方法:
Let me explain. I am considering the following two approaches:
方法1:
将A[i]
存储在2D数组中,每行包含固定c [i]的序列值.但是,请按列优先顺序存储数组,以使常数i的A [i]是连续的.
Store the A[i]
in a 2D array with each row containing the values of the sequence for a fixed c[i]. But store the array in column-major order, such that the A[i] for a constant i are contiguous.
然后使用列表推导来计算它们
Then use list comprehension to compute them
for i in range(0,n):
A[:,i+1] = [f(A[i],c) for c in Cs]
方法2:
与以前一样,将序列存储在2D数组中,但是这次以行优先顺序存储.
Store the sequences as before in a 2D array, but in row-major order this time.
具有给定一行的函数,该函数用给定c的序列值填充它.
Have a function that given a row fills it up with the values of the sequence for a given c.
def sequence(j):
A[j, 0] = a
for i in range(0, n - 1):
A[j, i+1] = f(A[j, i], Cs[j])
,然后使用multiprocessing.Pool
,在不同的线程进程中为不同的j
调用sequence
.
And then call sequence
for different j
s in different threads processes using, say multiprocessing.Pool
.
我应该选择哪两种方法?
Which of the two approaches should I prefer?
实验:
我尝试了以下测试
import numpy
from multiprocessing.dummy import Pool
from multiprocessing import cpu_count
import time
def func(x):
N = 400
A = numpy.array([[i*j for i in range(0,N)] for j in range(0, N)])
h = numpy.array([x for i in range(0, N)])
y = numpy.dot(A, h.transpose())
return y[-1]
start_time = time.time()
def multiproc():
print('Multiple processes')
print(cpu_count())
mypool = Pool(cpu_count())
print(mypool.map(func, [i for i in range(0,100)]))
def multiproc2():
print('Multiple processes 2')
pool = Pool(cpu_count())
res = numpy.empty(100)
for i in range(0,100):
res[i] = pool.apply_async(func, (i,)).get()
pool.close()
pool.join()
print(res)
def singleproc():
for i in range(0,100):
print(func(i))
print('Single process')
funcs = [multiproc, singleproc, multiproc2]
funcs[1]()
print("%.6f seconds" % (time.time() - start_time))
在funcs[0]()
或funcs[2]()
处更改呼叫funcs[1]()
的情况在每种情况下几乎都相同.
Changing the call funcs[1]()
for funcs[0]()
or funcs[2]()
we get pretty much the same time in every case.
推荐答案
我更喜欢使用Pool包装器,因为它肯定是更好的线程方法.试试这个:
I would prefer using the Pool wrapper as it definitely seems to be better the thread approach. Try this:
from multiprocessing import Pool
import numpy as np
def f(x, c):
return sin(x)+c
A = np.zeros(shape=(m, n))
for i in range(n-1):
pool = Pool()
res = []
for j in range(m):
res.append(pool.apply_async(f, (A[i, j], Cs[j])))
pool.close()
pool.join()
for j in range(m):
A[i+1, j] = res[j].get()
您始终可以对这两种方法进行计时,并使用以下哪种方法最快:
You can always time the two approaches and see which one is the fastest with:
import time
start_time = time.time()
# your code
print("%.6f seconds" % (time.time() - start_time))
它不是很准确,但是应该足以满足您的目的.
It is not very accurate, but it should be enough for your purpose.
这篇关于仿真中的多线程与列表理解的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!