使用rust-cpython从Rust并行运行Python代码 [英] Running Python code in parallel from Rust with rust-cpython

查看:146
本文介绍了使用rust-cpython从Rust并行运行Python代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Rust加速数据管道.该管道包含一些我不想修改的Python代码,因此我试图使用

I'm trying to speed up a data pipeline using Rust. The pipeline contains bits of Python code that I don't want to modify, so I'm trying to run them as-is from Rust using rust-cpython and multiple threads. However, the performance is not what I expected, it's actually the same as running the python code bits sequentially in a single thread.

在阅读文档时,据我所知,调用以下内容时,您实际上会得到一个指向只能创建一次的Python解释器的指针,即使您是从多个线程分别运行它也是如此.

Reading the documentation, I understand when invoking the following, you actually get a pointer to a single Python interpreter that can only be created once, even if you run it from multiple threads separately.

    let gil = Python::acquire_gil();
    let py = gil.python();

如果是这种情况,则意味着Python GIL实际上也阻止了Rust中的所有并行执行.有办法解决这个问题吗?

If that's the case, it means the Python GIL is actually preventing all parallel execution in Rust as well. Is there a way to solve this problem?

这是我测试的代码:

use cpython::Python;
use std::thread;
use std::sync::mpsc;
use std::time::Instant;

#[test]
fn python_test_parallel() {
    let start = Instant::now();

    let (tx_output, rx_output) = mpsc::channel();
    let tx_output_1 = mpsc::Sender::clone(&tx_output);
    thread::spawn(move || {
        let gil = Python::acquire_gil();
        let py = gil.python();
        let start_thread = Instant::now();
        py.run("j=0\nfor i in range(10000000): j=j+i;", None, None).unwrap();
        println!("{:27} : {:6.1} ms", "Run time thread 1, parallel", (Instant::now() - start_thread).as_secs_f64() * 1000f64);
        tx_output_1.send(()).unwrap();
    });

    let tx_output_2 = mpsc::Sender::clone(&tx_output);
    thread::spawn(move || {
        let gil = Python::acquire_gil();
        let py = gil.python();
        let start_thread = Instant::now();
        py.run("j=0\nfor i in range(10000000): j=j+i;", None, None).unwrap();
        println!("{:27} : {:6.1} ms", "Run time thread 2, parallel", (Instant::now() - start_thread).as_secs_f64() * 1000f64);
        tx_output_2.send(()).unwrap();
    });

    // Receivers to ensure all threads run
    let _output_1 = rx_output.recv().unwrap();
    let _output_2 = rx_output.recv().unwrap();
    println!("{:37} : {:6.1} ms", "Total time, parallel", (Instant::now() - start).as_secs_f64() * 1000f64);
}

推荐答案

Python的CPython实现不允许在多个线程中同时执行Python bytecode .如您所知,全局解释器锁定(GIL)可以防止这种情况.

The CPython implementation of Python does not allow executing Python bytecode in multiple threads at the same time. As you note yourself, the global interpreter lock (GIL) prevents this.

关于您的Python代码到底在做什么,我们没有任何信息,因此,我将提供一些一般性提示,以帮助您提高代码的性能.

We don't have any information on what exactly your Python code is doing, so I'll give a few general hints how you could improve the performance of your code.

  • 如果您的代码受I/O限制,例如从网络上读取数据,使用多个线程通常会获得不错的性能改进.阻塞的I/O调用将在阻塞之前释放GIL,以便其他线程可以在这段时间内执行.

  • If your code is I/O-bound, e.g. reading from the network, you will generally get nice performance improvements from using multiple threads. Blocking I/O calls will release the GIL before blocking, so other threads can execute during that time.

某些库,例如NumPy,在不需要访问Python数据结构的长时间运行的库调用期间在内部释放GIL.使用这些库,即使仅使用该库编写纯Python代码,也可以提高多线程,CPU绑定代码的性能.

Some libraries, e.g. NumPy, internally release the GIL during long-running library calls that don't need access to Python data structures. With these libraries, you can get performance improvements for multi-threaded, CPU-bound code even if you only write pure Python code using the library.

如果您的代码受CPU限制并且花费大量时间执行Python字节码,则通常可以使用multipe 进程而不是线程来实现并行执行.Python标准库中的 multiprocessing 这个.

If your code is CPU-bound and spends most of its time executing Python bytecode, you can often use multipe processes rather than threads to achieve parallel execution. The multiprocessing in the Python standard library helps with this.

如果您的代码是CPU绑定的,则将其大部分时间用于执行Python字节码 ,并且由于它访问共享数据,因此无法在并行进程中运行,因此您无法运行它在多个并行线程中– GIL防止了这种情况.但是,即使没有GIL,也不能在没有更改任何语言的情况下并行运行顺序代码.由于您可以并发访问某些数据,因此需要添加锁定并可能进行算法更改以防止数据竞争.具体操作方法取决于您的用例.(如果您没有具有并发数据访问权限,则应使用进程而不是线程–参见上文.)

If your code is CPU-bound, spends most of its time executing Python bytecode and can't be run in parallel processes because it accesses shared data, you can't run it in multiple threads in parallel – the GIL prevents this. However, even without the GIL, you can't just run sequential code in parallel without changes in any language. Since you have concurrent access to some data, you need to add locking and possibly make algorithmic changes to prevent data races; the details of how to do this depend on your use case. (And if you don't have concurrent data access, you should use processes instead of threads – see above.)

除了并行性之外,使用Rust加速Python代码的一种好方法是配置您的Python代码,找到花费大部分时间的热点,并且将这些位重写作为您从Python代码调用的Rust函数.如果这样做不能给您带来足够的提速,您可以将这种方法与并行性进行 combine 结合使用-与其他大多数语言相比,在Rust中防止数据竞争通常更容易实现.

Beyond parallelism, a good way to speed up Python code with Rust is to profile your Python code, find the hot spots where most of the time is spent, and rewrite these bits as Rust functions that you call from your Python code. If this doesn't give you enough of a speedup, you can combine this approach with parallelism – preventing data races is generally easier to achieve in Rust than in most other languages.

这篇关于使用rust-cpython从Rust并行运行Python代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆