并发NDKernal启动的OpenCL多命令队列 [英] OpenCL multiple command queue for Concurrent NDKernal Launch

查看：515 发布时间：2020/5/20 18:59:33 concurrency opencl gpu gpgpu multi-gpu

本文介绍了并发NDKernal启动的OpenCL多命令队列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试运行矢量加法应用程序，在该应用程序中我需要同时启动多个内核，因此对于并发内核启动，我最后一个问题中有人建议我使用多个命令队列. 我是用数组定义的

I m trying to run an application of vector addition, where i need to launch multiple kernels concurrently, so for concurrent kernel launch someone in my last question advised me to use multiple command queues. which i m defining by an array

context = clCreateContext(NULL, 1, &device_id, NULL, NULL, &err);
    for(i=0;i<num_ker;++i)
    {
queue[i] = clCreateCommandQueue(context, device_id, 0, &err);
    }

在上述代码的某些地方出现错误命令被信号11终止".

I m getting an error "command terminated by signal 11" some where around the above code.

我也使用for循环来启动内核和入队数据

i m using for loop for launching kernels and En-queue data too

 for(i=0;i<num_ker;++i)
 {
 err = clEnqueueNDRangeKernel(queue[i], kernel, 1, NULL, &globalSize, &localSize,
                                                          0, NULL, NULL);
 }

问题是我不确定我哪里出错了，我看到可以在其中放置命令队列数组的地方，所以这就是为什么我使用数组的原因. 另一个信息，当我不使用A for循环时，只需手动定义多个命令队列，它就可以正常工作.

The thing is I m not sure where m i going wrong, i saw somewhere that we can make array of command queues, so thats why i m using an array. another information, when i m not using A for loop, just manually defining multiple command queues, it works fine.

推荐答案

我也阅读了您的最后一个问题，我认为您应该首先重新考虑您真正想做什么，以及OpenCL是否真的是这样做的方式.

I read as well your last question, and I think you should first rethink what do you really want to do and if OpenCL is really the way of doing it.

OpenCL是用于大型并行处理和数据处理的API. 每个内核(或排队的任务)在多个数据上并行运行的位置值同时显示，因此比任何串行CPU处理都要好几个数量级.

OpenCL is an API for masive parallel processing and data crunching. Where each kernel (or queued task) operates parallelly on many data values at the same time, therefore outperforming any serial CPU processing by many orders of magnitude.

OpenCL的典型用例是1个运行数百万个工作项的内核. 如果应用程序更高级，则可能需要多个序列的不同内核，以及CPU和GPU之间的特殊同步.

The typical use case for OpenCL is 1 kernel running millions of work items. Were more advance applications may need multiple sequences of different kernels, and special syncronizations between CPU and GPU.

但是并发并不是必须的.(否则，单个核心CPU将无法执行任务，事实并非如此.速度会变慢，可以，但仍然会可以运行)

But concurrency is never a requirement. (Otherwise, a single core CPU would not be able to perform the task, and thats never the case. It will be slower, ok, but it will still be possible to run it)

即使需要同时运行两个任务.并发时间是否相同:

Even if 2 tasks need to run at the same time. The time taken will be the same concurrently or not:

非并发情况:

Kernel 1: *
Kernel 2: -
GPU Core 1: *****-----
GPU Core 2: *****-----
GPU Core 3: *****-----
GPU Core 4: *****-----

并发案例:

Kernel 1: *
Kernel 2: -
GPU Core 1: **********
GPU Core 2: **********
GPU Core 3: ----------
GPU Core 4: ----------

实际上，非并发情况是首选，因为至少第一个任务已经完成，并且可以继续进行进一步的处理.

In fact, the non concurrent case is preferred, since at least the first task is already completed and further processing can continue.

据我了解，您想做的是同时运行多个内核.这样内核才能完全同时运行.例如，运行100个内核(相同或不同的内核)并同时运行它们.

What you do want to do, as far as I understand, is run multiple kernels at the same time. So that the kernels run fully concurrently. For example, run 100 kernels (same kernel or different) and run them at the same time.

那根本不适合OpenCL模型.而且实际上可能比CPU单线程要慢.

如果每个内核都独立于其他所有内核，则一次只能为一个内核分配一个内核(SIMD或CPU)(因为它们只有1台PC)，即使它们可以同时运行1000个线程.在理想情况下，这会将您的OpenCL设备转换为几个内核(6-10)的池，这些池按顺序消耗排队的内核.前提是该API也支持它以及该设备，但情况并非总是如此.在最坏的情况下，您将只有一个设备运行一个内核，并且浪费了99％.

If each kernel is independent to all the others, a core (SIMD or CPU) can only be allocated for 1 kernel at a time (because they only have 1 PC), even though they could run 1k threads at the same time. In an ideal scenario, this will convert your OpenCL device in a pool of few cores (6-10) that consume serially the kernels queued. And that is supposing the API supports it and the device as well, what is not always the case. In the worst case you will have a single device that runs a single kernel and is 99% wasted.

可以在OpenCL中完成的工作示例:

Examples of stuff that can be done in OpenCL:

数据处理/处理.乘以矢量，模拟粒子等.
图像处理，边界检测，过滤等
视频压缩，版本，生成
射线追踪，复杂的光数学等
排序

不适合OpenCL的内容示例:

Examples of stuff that are not suitable for OpenCL:

参加异步请求(HTTP，流量，交互式数据)
处理少量数据
处理每种类型的数据需要完全不同的处理

从我的角度来看，使用多个内核的唯一实际用例是后者，并且无论您做什么，在那种情况下性能都会很糟糕. 最好改用多线程池.

From my point of view, the only real use case of using multiple kernels is the latter, and no matter what you do the performance will be horrible in that case. Better use a multithread pool instead.

这篇关于并发NDKernal启动的OpenCL多命令队列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

并发NDKernal启动的OpenCL多命令队列 [英] OpenCL multiple command queue for Concurrent NDKernal Launch

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

并发NDKernal启动的OpenCL多命令队列 [英] OpenCL multiple command queue for Concurrent NDKernal Launch

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭