具有PPL和并行内存分配的线程ID [英] Thread IDs with PPL and Parallel Memory Allocation

查看:388
本文介绍了具有PPL和并行内存分配的线程ID的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个关于Microsoft PPL库和一般的并行编程的问题。我使用FFTW来执行64×64×64的FFT和逆FFT的大集合(100,000)。在我目前的实现中,我使用并行for循环并在循环内分配存储阵列。我注意到,在这些情况下,我的CPU使用率只有大约60-70%。 (注意,这仍然比我已经测试的FFTW提供的内置螺纹FFT更好的利用率)。因为我使用fftw_malloc,是否有可能发生过多的锁定,阻止了完全使用?

I have a question about the Microsoft PPL library, and parallel programming in general. I am using FFTW to perform a large set (100,000) of 64 x 64 x 64 FFTs and inverse FFTs. In my current implementation, I use a parallel for loop and allocate the storage arrays within the loop. I have noticed that my CPU usage only tops out at about 60-70% in these cases. (Note this is still better utilization than the built in threaded FFTs provided by FFTW which I have tested). Since I am using fftw_malloc, is it possible that excessive locking is occurring which is preventing full usage?

有鉴于此,建议为每个预分配存储阵列线程在主处理循环之前,因此在循环本身内不需要锁?如果是这样,这是可能与MSFT PPL库?我一直在使用OpenMP,在这种情况下,它是足够简单,使用提供的函数获取线程ID。我在PPL文档中没有看到类似的功能。

In light of this, is it advisable to preallocate the storage arrays for each thread before the main processing loop, so no locks are required within the loop itself? And if so, how is this possible with the MSFT PPL library? I have been using OpenMP before, in that case it is simple enough to get a thread ID using supplied functions. I have not however seen a similar function in the PPL documentation.

推荐答案

我只是回答这个,因为没有人发布任何东西。

I am just answering this because nobody has posted anything yet.

如果需要重锁,Mutex(e)会对性能造成严重破坏。此外,如果需要大量的内存(重新)分配,这也可以降低性能和限制它的内存带宽。像你说的预分配后面的线程操作可以是有用的。然而,这需要你有固定的线程计数,并且你的工作负载在所有线程上均衡。

Mutex(e)s can wreak havoc on performance if heavy locking is required. In addition if a lot of memory (re)-allocation is needed, that can also decrease performance and limit it to your memory bandwidth. Like you said a preallocation which later threads operate on can be usefull. However this requires that you have a fixed threadcount and that you spread your workload balanced on all threads.

关于PPL thread_id函数,我只能谈论Intel-TBB,这应该是类似于PPL。 TBB - 我想PPL - 不直接讲线程,而是谈论任务,TBB的目的是抽象这些底层的细节远离用户,因此它不提供thread_id函数。

Concerning the PPL thread_id functions, I can only speak about Intel-TBB, which however should be pretty similiar to PPL. TBB - and I suppose also PPL - is not speaking of threads directly, instead they are talking about tasks, the aim of TBB was to abstract these underlaying details away from the user, thus it does not provide a thread_id function.

这篇关于具有PPL和并行内存分配的线程ID的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆