调用外部模块时,多处理池变慢 [英] Multiprocessing Pool slow when calling external module

查看:113
本文介绍了调用外部模块时,多处理池变慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的脚本正在调用 librosa 模块以简短地计算梅尔频率倒谱系数(MFCC)音频片段.加载音频后,我想尽可能快地计算这些(以及其他一些音频功能),因此可以进行多处理.

My script is calling librosa module to compute Mel-frequency cepstral coefficients (MFCCs) for short pieces of audio. After loading the audio, I'd like to compute these (along with some other audio features) as fast as possible - hence multiprocessing.

问题:多处理变体比顺序变体慢得多.分析表明,我的代码在<method 'acquire' of '_thread.lock' objects>上花费了90%以上的时间.如果有很多小任务也就不足为奇了,但是在一个测试案例中,我将音频分为4个块,然后分别处理.我当时认为开销应该很小,但实际上,它和许多小任务一样糟糕.

Problem: multiprocessing variant is much slower than sequential. Profiling says my code spends over 90% of the time on <method 'acquire' of '_thread.lock' objects>. It's not surprising if it were many small tasks, but in one test case, I am dividing my audio into 4 chunks and process then in separate processes. I was thinking overhead should be minimal, but in practice, it's almost as bad as with many small tasks.

据我所知, multiprocessing 模块应该分叉几乎所有东西,并且不应该为争夺锁而战.但是,结果似乎表明有所不同.可能是 librosa 模块在下面保留了某种内部锁吗?

To my understanding, multiprocessing module should fork almost everything and there should not be any fighting for a lock. However, the results seem to show something different. Could it be that librosa module underneath keeps some sort of internal lock?

我的配置文件以纯文本显示: https://drive.google.com/open? id = 17DHfmwtVOJOZVnwIueeoWClUaWkvhTPc

My profiling results in plain text: https://drive.google.com/open?id=17DHfmwtVOJOZVnwIueeoWClUaWkvhTPc

作为图像: https://drive.google.com/open?id=1KuZyo0CurHd9GjXge5CYQHOWn

用于重现问题"的代码:

The code to reproduce the "problem":

import time
import numpy as np
import librosa
from functools import partial
from multiprocessing import Pool

n_proc = 4

y, sr = librosa.load(librosa.util.example_audio_file(), duration=60) # load audio sample
y = np.repeat(y, 10) # repeat signal so that we can get more reliable measurements
sample_len = int(sr * 0.2) # We will compute MFCC for short pieces of audio

def get_mfcc_in_loop(audio, sr, sample_len):
    # We split long array into small ones of lenth sample_len
    y_windowed = np.array_split(audio, np.arange(sample_len, len(audio), sample_len))
    for sample in y_windowed:
        mfcc = librosa.feature.mfcc(y=sample, sr=sr)

start = time.time()
get_mfcc_in_loop(y, sr, sample_len)
print('Time single process:', time.time() - start)

# Let's test now feeding these small arrays to pool of 4 workers. Since computing
# MFCCs for these small arrays is fast, I'd expect this to be not that fast
start = time.time()
y_windowed = np.array_split(y, np.arange(sample_len, len(y), sample_len))
with Pool(n_proc) as pool:
    func = partial(librosa.feature.mfcc, sr=sr)
    result = pool.map(func, y_windowed)
print('Time multiprocessing (many small tasks):', time.time() - start)

# Here we split the audio into 4 chunks and process them separately. This I'd expect
# to be fast and somehow it isn't. What could be the cause? Anything to do about it?
start = time.time()
y_split = np.array_split(y, n_proc)
with Pool(n_proc) as pool:
    func = partial(get_mfcc_in_loop, sr=sr, sample_len=sample_len)
    result = pool.map(func, y_split)
print('Time multiprocessing (a few large tasks):', time.time() - start)

我的机器上的结果:

  • 单次处理时间:8.48s
  • 时间多处理(许多小任务):44.20s
  • 时间多处理(一些大任务):41.99s

任何想法是由什么引起的?更好的是,如何使其变得更好?

Any ideas what's causing it? Better yet, how to make it better?

推荐答案

要调查正在发生的事情,我运行top -H并注意到生成了+60个线程!就是这样事实证明, librosa 和依赖项产生了许多额外的线程,这些线程一起破坏了并行性.

To investigate what's happening, I run top -H and noticed +60 threads being spawned! That was it. Turns out librosa and dependencies spawn many extra threads that together ruin the parallelism.

The problem of oversubscription is well described in joblib docs. Let's use it then.

import time
import numpy as np
import librosa
from joblib import Parallel, delayed

n_proc = 4

y, sr = librosa.load(librosa.util.example_audio_file(), duration=60) # load audio sample
y = np.repeat(y, 10) # repeat signal so that we can get more reliable measurements
sample_len = int(sr * 0.2) # We will compute MFCC for short pieces of audio

def get_mfcc_in_loop(audio, sr, sample_len):
    # We split long array into small ones of lenth sample_len
    y_windowed = np.array_split(audio, np.arange(sample_len, len(audio), sample_len))
    for sample in y_windowed:
        mfcc = librosa.feature.mfcc(y=sample, sr=sr)

start = time.time()
y_windowed = np.array_split(y, np.arange(sample_len, len(y), sample_len))
Parallel(n_jobs=n_proc, backend='multiprocessing')(delayed(get_mfcc_in_loop)(audio=data, sr=sr, sample_len=sample_len) for data in y_windowed)
print('Time multiprocessing with joblib (many small tasks):', time.time() - start)


y_split = np.array_split(y, n_proc)
start = time.time()
Parallel(n_jobs=n_proc, backend='multiprocessing')(delayed(get_mfcc_in_loop)(audio=data, sr=sr, sample_len=sample_len) for data in y_split)
print('Time multiprocessing with joblib (a few large tasks):', time.time() - start)

结果:

  • 使用joblib进行时间多处理(许多小任务):2.66
  • 使用joblib进行时间多处理(一些大任务):2.65

比使用 multiprocessing 模块快15倍.

15x faster than using multiprocessing module.

这篇关于调用外部模块时,多处理池变慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆