一种并行算法命令 - preserving选择从索引表 [英] A parallel algorithm for order-preserving selection from an index table

查看:170
本文介绍了一种并行算法命令 - preserving选择从索引表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

命令中─preserving选择是微不足道的串行code,但在多线程是那么简单,尤其是如果一个人想通过保留效率(多线程整点)避免链表。考虑串行code

Order-preserving selection from an index table is trivial in serial code, but in multi-threading is less straightforward, in particular if one wants to retain efficiency (the whole point of multi-threading) by avoiding linked lists. Consider the serial code

template<typename T>
std::vector<T> select_in_order(
  std::vector<std::size_t> const&keys, // permutation of 0 ... key.size()-1
  std::vector<T> const&data)           // anything copyable
{ // select data[keys[i]] allowing keys.size() >= data.size()
  std::vector<T> result;
  for(auto key:keys)
    if(key<data.size())
      result.push_back(data[key]);
  return result;
}

我怎么能做到这一点的多线程(比如利用TBB甚至OpenMP的),特别是如果 data.size()&LT; key.size()

推荐答案

你要找的是一个名为<并行计算操作href="http://stackoverflow.com/questions/8388125/cuda-stream-compaction-understanding-the-concept">Stream压实。

它可以有效地并行执行,尽管该算法是不平凡的。最好的办法是使用它实现它了,如推力库。如果你真正想实现自己,不过,该算法的解释可以在 GPU编程发现章节39.3.1 或替代地,在 Udacity的介绍到并行编程课程,课程4.5

It can be implemented efficiently in parallel, though the algorithm is non-trivial. Your best bet would be to use a library which implements it already, such as Thrust. If you truly want to implement yourself, though, an explanation of the algorithm can be found in GPU Programming Chapter 39.3.1, or alternatively, in Udacity's Intro to Parallel Programming course, Lesson 4.5.

从本质上讲,它涉及确定一个布尔predicate 为您的阵列的(在你的榜样,键&LT; data.size()映射到一个单独的数组中,采取的扫描在predicate数组,然后做一个的

Essentially, it involves defining a boolean predicate for your array (in your example, key<data.size()), mapping it to a separate array, taking the Scan over the predicate array, then doing a Scatter.

地图()散点图()易于并行实现; 的实施扫描()是不平凡的一部分。大多数并行库将有一个扫描()的实施;如果不是,上述链接都描述几个并行扫描的算法。

Map() and Scatter() are easy to implement in parallel; the implementation of Scan() is the non-trivial part. Most parallel libraries will have a Scan() implementation; if not, the above links both describe several parallel scan algorithms.

这是所有假设你有很多的核心,就像在GPU上。在CPU上,它很​​可能只是更快地连续做;或分割成数组大块,处理大块连续的(在并行不同的内核)的,并且将结果合并到一起。哪种方法最好取决于您的数据的(原作品如果大部分按键都预计将在决赛阵列更好)的。

This is all assuming you have many cores, like on a GPU. On a CPU, it would probably just be faster to do it serially; or to divide the array into large chunks, process the chunks serially (on different cores in parallel), and merge the results back together. Which approach is best depends on your data (the former works better if most keys are expected to be in the final array).

这篇关于一种并行算法命令 - preserving选择从索引表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆