如何使带有两个for循环的python代码运行得更快(是否存在执行Mathematica的Parallelize的python方法)? [英] How to make the python code with two for loop run faster(Is there a python way of doing Mathematica's Parallelize)?

查看:121
本文介绍了如何使带有两个for循环的python代码运行得更快(是否存在执行Mathematica的Parallelize的python方法)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对python或任何此类编程语言完全陌生。我有一些Mathematica的经验。我遇到了一个数学问题,尽管Mathematica用自己的并行化方法解决了问题,但是在使用了所有内核之后,系统仍然十分疲惫!在跑步过程中,我几乎无法使用机器。因此,我一直在寻找一些编码替代方案,并发现python易于学习和实现。因此,事不宜迟,让我告诉您数学问题以及我的python代码问题。由于完整的代码太长,让我概述一下。

I am completely new to python or any such programming language. I have some experience with Mathematica. I have a mathematical problem which though Mathematica solves with her own 'Parallelize' methods but leaves the system quite exhausted after using all the cores! I can barely use the machine during the run. Hence, I was looking for some coding alternative and found python kind of easy to learn and implement. So without further ado, let me tell you the mathematical problem and issues with my python code. As the full code is too long, let me give an outline.

1。数值求解形式为y''(t)+ f(t)y(t)= 0的微分方程,以在某个范围内获得y(t),例如C< = t< = D

2。接下来,对某个所需范围的数值结果进行插值以得到函数:w(t),例如A <= t< = B

3。使用w(t)来求解另一个形式为a''和b的范围为z''(t)+ [a + b W(t)] z(t)= 0的微分方程,对此我正在使用

4。 Deine F = 1 + sol1 [157],以生成类似{a,b,F} 的列表。因此,让我给出一个原型循环,因为这占用了大部分计算时间。

4. Deine F = 1 + sol1[157], to make a list like {a, b, F}. So let me give a prototype loop as this take most of the computation time.

for q in np.linspace(0.0, 4.0, 100):
    for a in np.linspace(-2.0, 7.0, 100):
        print('Solving for q = {}, a = {}'.format(q,a))
        sol1 = odeint(fun, [1, 0], t, args=( a, q))[..., 0]
        print(t[157])
        F = 1 + sol1[157]                    
        f1.write("{}  {} {} \n".format(q, a, F))            
    f1.close()

现在,真正的循环大约需要4个小时30分钟才能完成(使用w(t)的某些内置函数形式,大约需要2分钟)。何时,我在代码中定义fun之前应用了(没有正确地理解它的作用和方式!) numba / autojit ,运行时间显着改善,大约需要2个小时。小时和30分钟。另外,将两个循环编写为 itertools / product 可以将运行时间进一步减少约2分钟!但是,当我让她使用全部4个核心时,Mathematica会在30分钟内完成任务。

Now, the real loop takes about 4 hrs and 30 minutes to complete (With some built-in functional form of w(t), it takes about 2 minute). When, I applied (without properly understanding what it does and how!) numba/autojit before the definition of fun in my code, the run time significantly improved and takes about 2 hrs and 30 minute. Also, writing two loops as itertools/product further reduces the run time by about 2 minutes only! However, Mathematica, when I let her use all the 4 cores, finishes the task within 30 minutes.

那么,有没有办法改善python中的运行时间?

推荐答案

要加快python的运行速度,您有以下三种选择:


  • 交易程序中存在特定瓶颈(如@LutzL的注释中所建议的那样)

  • 尝试通过使用 cython (或使用编织或类似技术)。由于您的情况下耗时的计算不是在python代码中正确的,而是在scipy模块中(至少我相信它们是正确的),因此这对您没有太大帮助。

  • 实现<如您在原始问题中所建议的,a href = https://docs.python.org/3/library/multiprocessing.html rel = nofollow noreferrer>多重处理。如果您有X内核,这将使您的代码速度提高多达X(略小于)倍。不幸的是,这在python中相当复杂。

  • deal with specific bottlenecks in the program (as suggested in @LutzL's comment)
  • try to speed up the code by compiling it into C using cython (or including C code using weave or similar techniques). Since the time-consuming computations in your case are not in python code proper but in scipy modules (at least I believe they are), this would not help you very much here.
  • implement multiprocessing as you suggested in your original question. This will speed up your code to up to X (slightly less than) times faster if you have X cores. Unfortunately this is rather complicated in python.

实现多重处理-使用原始问题的原型循环示例

我假设您在原型代码的嵌套循环内进行的计算实际上是彼此独立的。由于您的原型代码不完整,因此我不确定情况是否如此。否则,它当然不起作用。我将举一个例子,不是将您的微分方程问题用于fun函数,而是使用具有相同签名(输入和输出变量)的原型。

I assume that the computations you do inside the nested loops in your prototype code are actually independent from one another. Since your prototype code is incomplete, I am not sure this is the case, however. Otherwise it will, of course, not work. I will give an example using not your differential equation problem for the fun function but a prototype of the same signature (input and output variables).

import numpy as np
import scipy.integrate
import multiprocessing as mp

def fun(y, t, b, c):
    # replace this function with whatever function you want to work with
    #    (this one is the example function from the scipy docs for odeint)
    theta, omega = y
    dydt = [omega, -b*omega - c*np.sin(theta)]
    return dydt

#definitions of work thread and write thread functions

def run_thread(input_queue, output_queue):
    # run threads will pull tasks from the input_queue, push results into output_queue
    while True:
        try:
            queueitem = input_queue.get(block = False)
            if len(queueitem) == 3:
                a, q, t = queueitem
                sol1 = scipy.integrate.odeint(fun, [1, 0], t, args=( a, q))[..., 0]
                F = 1 + sol1[157]
                output_queue.put((q, a, F))
        except Exception as e:
            print(str(e))
            print("Queue exhausted, terminating")
            break

def write_thread(queue):    
    # write thread will pull results from output_queue, write them to outputfile.txt
    f1 = open("outputfile.txt", "w")
    while True:
        try:
            queueitem = queue.get(block = False)
            if queueitem[0] == "TERMINATE":
                f1.close()
                break
            else:
                q, a, F = queueitem                
                print("{}  {} {} \n".format(q, a, F))            
                f1.write("{}  {} {} \n".format(q, a, F))            
        except:
            # necessary since it will throw an error whenever output_queue is empty
            pass

# define time point sequence            
t = np.linspace(0, 10, 201)

# prepare input and output Queues
mpM = mp.Manager()
input_queue = mpM.Queue()
output_queue = mpM.Queue()

# prepare tasks, collect them in input_queue
for q in np.linspace(0.0, 4.0, 100):
    for a in np.linspace(-2.0, 7.0, 100):
        # Your computations as commented here will now happen in run_threads as defined above and created below
        # print('Solving for q = {}, a = {}'.format(q,a))
        # sol1 = scipy.integrate.odeint(fun, [1, 0], t, args=( a, q))[..., 0]
        # print(t[157])
        # F = 1 + sol1[157]    
        input_tupel = (a, q, t)
        input_queue.put(input_tupel)

# create threads
thread_number = mp.cpu_count()
procs_list = [mp.Process(target = run_thread , args = (input_queue, output_queue)) for i in range(thread_number)]         
write_proc = mp.Process(target = write_thread, args = (output_queue,))

# start threads
for proc in procs_list:
    proc.start()
write_proc.start()

# wait for run_threads to finish
for proc in procs_list:
    proc.join()

# terminate write_thread
output_queue.put(("TERMINATE",))
write_proc.join()

说明


  • 我们在开始计算之前定义各个问题(或更确切地说是它们的参数);我们将它们收集在输入队列中。

  • 我们定义了一个在线程中运行的函数( run_thread )。此函数将计算单个问题,直到输入Queue中没有剩余为止。

  • 我们启动与CPU数量一样多的线程。

  • 我们启动另一个线程( write_thread )用于从输出队列中收集结果并将其写入文件。

  • We define the individual problems (or rather their parameters) before commencing computation; we collect them in an input Queue.
  • We define a function (run_thread) that is run in the threads. This function computes individual problems until there are none left in the input Queue; it pushes the results into an output Queue.
  • We start as many such threads as we have CPUs.
  • We start an additional thread (write_thread) for collecting the results from the output queue and writing them into a file.

注意事项


  • 对于较小的问题,您可以在没有队列的情况下运行多处理。但是,如果单个计算的数量很大,您将超过内核允许的最大线程数量,之后内核将杀死您的程序。

  • 不同的操作系统之间存在差异关于多处理的工作方式。上面的示例将在Linux上运行(也许也可以在其他Unix之类的系统,例如Mac和BSD)上运行,不适用于Windows 。原因是Windows没有fork()系统调用。 (我没有Windows的访问权限,因此无法尝试在Windows上实现它。)

  • For smaller problems, you can run multiprocessing without Queues. However, if the number of individual computations is large, you will exceed the maximum number of threads the kernel will allow you after which the kernel kills your program.
  • There are differences between different operating systems for how multiprocessing works. The example above will work on Linux (perhaps also on other Unix like systems such as Mac and BSD), not on Windows. The reason is that Windows does not have a fork() system call. (I do not have access to a Windows, can therefore not try to implement it for Windows.)

这篇关于如何使带有两个for循环的python代码运行得更快(是否存在执行Mathematica的Parallelize的python方法)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆