仿真中的多线程与列表理解 [英] multithread vs list comprehension in simulation

查看:92
本文介绍了仿真中的多线程与列表理解的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我们有一个递归关系

Suppose we have a recurrence relation

A[0] = a
A[i+1] = f(A[i],c)

其中c是参数,而f是某些函数,例如

where c is a parameter and f is some function, say

def f(x, c):
    return sin(x) + c

假定对于给定的a,我们要为c in Cs = [c[j] for j in range(0, m)]评估A[i] for i in range(0,n),其中nm相当大.

Assume that, for a given a, we want to evaluate A[i] for i in range(0,n) for c in Cs = [c[j] for j in range(0, m)], where n and m are fairly large.

几个问题:我应该为每个c使用多线程还是为每个i使用列表推导.

Question in a few words: Should I use multithreading for each c or list comprehension for each i.

让我解释一下.我正在考虑以下两种方法:

Let me explain. I am considering the following two approaches:

方法1:

A[i]存储在2D数组中,每行包含固定c [i]的序列值.但是,请按列优先顺序存储数组,以使常数i的A [i]是连续的.

Store the A[i] in a 2D array with each row containing the values of the sequence for a fixed c[i]. But store the array in column-major order, such that the A[i] for a constant i are contiguous.

然后使用列表推导来计算它们

Then use list comprehension to compute them

for i in range(0,n):
    A[:,i+1] = [f(A[i],c) for c in Cs]

方法2:

与以前一样,将序列存储在2D数组中,但是这次以行优先顺序存储.

Store the sequences as before in a 2D array, but in row-major order this time.

具有给定一行的函数,该函数用给定c的序列值填充它.

Have a function that given a row fills it up with the values of the sequence for a given c.

def sequence(j):
    A[j, 0] = a
    for i in range(0, n - 1):
        A[j, i+1] = f(A[j, i], Cs[j])

,然后使用multiprocessing.Pool,在不同的线程进程中为不同的j调用sequence.

And then call sequence for different js in different threads processes using, say multiprocessing.Pool.

我应该选择哪两种方法?

Which of the two approaches should I prefer?

实验:

我尝试了以下测试

import numpy
from multiprocessing.dummy import Pool
from multiprocessing import cpu_count
import time

def func(x):
    N = 400
    A = numpy.array([[i*j for i in range(0,N)] for j in range(0, N)])
    h = numpy.array([x for i in range(0, N)])
    y = numpy.dot(A, h.transpose())
    return y[-1]

start_time = time.time()

def multiproc():
    print('Multiple processes')
    print(cpu_count())
    mypool = Pool(cpu_count())
    print(mypool.map(func, [i for i in range(0,100)]))


def multiproc2():
    print('Multiple processes 2')
    pool = Pool(cpu_count())
    res = numpy.empty(100)
    for i in range(0,100):
        res[i] = pool.apply_async(func, (i,)).get()
    pool.close()
    pool.join()
    print(res)

def singleproc():
    for i in range(0,100):
        print(func(i))
    print('Single process')

funcs = [multiproc, singleproc, multiproc2]

funcs[1]()

print("%.6f seconds" % (time.time() - start_time))

funcs[0]()funcs[2]()处更改呼叫funcs[1]()的情况在每种情况下几乎都相同.

Changing the call funcs[1]() for funcs[0]() or funcs[2]() we get pretty much the same time in every case.

推荐答案

我更喜欢使用Pool包装器,因为它肯定是更好的线程方法.试试这个:

I would prefer using the Pool wrapper as it definitely seems to be better the thread approach. Try this:

from multiprocessing import Pool
import numpy as np

def f(x, c):
    return sin(x)+c

A = np.zeros(shape=(m, n))
for i in range(n-1):
    pool = Pool()
    res = []
    for j in range(m):
        res.append(pool.apply_async(f, (A[i, j], Cs[j])))
    pool.close()
    pool.join()
    for j in range(m):
        A[i+1, j] = res[j].get()

您始终可以对这两种方法进行计时,并使用以下哪种方法最快:

You can always time the two approaches and see which one is the fastest with:

 import time
 start_time = time.time()
 # your code
 print("%.6f seconds" % (time.time() - start_time))

它不是很准确,但是应该足以满足您的目的.

It is not very accurate, but it should be enough for your purpose.

这篇关于仿真中的多线程与列表理解的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆