clEnqueueNDRangeKernel阻止执行 [英] clEnqueueNDRangeKernel blocks execution

查看:126
本文介绍了clEnqueueNDRangeKernel阻止执行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我现在要问另一个问题.我一直在尝试将内核的结果与它的执行并行进行分析,同时将其分解为多个调用.但是,尽管clEnqueueReadBuffer有一个布尔值来确定是否阻塞,但clEnqueueNDRangeKernel没有布尔值,并且我一直认为它始终是异步的(毕竟它被排队"了,这使我认为它像一个任务队列一样工作).但是,当我运行此代码块时,直到内核完全完成,外部代码才会执行​​(我未明确调用clFinish或类似的操作会导致此行为).

Another question for me now. I've been trying to analyze the results of my kernel parallel to its execution while it's broken up to multiple calls. However, while clEnqueueReadBuffer has a boolean to determine whether it blocks or not, clEnqueueNDRangeKernel has none and I had assumed it was async always (It is being "enqueued" afterall which makes me assume that it would act like a task queue). However, when I run this block of code the outer code doesn't get executed until the kernel has been finished completely (I am not explicitly calling clFinish or anything like that would cause this behavior).

我正在NVidia GPU上运行内核.那么为什么这部分代码阻塞?我该怎么做才能在OpenCL中进行补救?否则,我正在考虑运行单独的线程以仅将这些内核命令排队"到队列中.

I'm running the kernel on an NVidia GPU. So why is this segment of code blocking and what could I do to remedy it within OpenCL? Otherwise, I'm considering running a separate thread solely to "enqueue" these kernel commands to the queue.

const size_t amountPerGo = multipleRoundUp(local_ws, (int)(50000));
//Finds the smallest multiple of local worksize that greater than the 50000 segment

std::cout << "Launch" << std::endl;

for( int j = 0; j < 10; j++ ) //Make the effects more extreme
{
    for( size_t i = 0; i < dimensions.x*dimensions.y; i+= amountPerGo )
    {
        clSetKernelArg(rayKernel, 6, sizeof(int), &i);
        std::cout << "sub" << std::endl;

        error = clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &amountPerGo, &local_ws, 0, NULL, NULL);

        // Reading back
        clEnqueueReadBuffer(queue, outResult, CL_FALSE, sizeof(vec4)*i, sizeof(vec4)*(amountPerGo), resultSet+i, 0, NULL, NULL);
    }
}

std::cout << "End launch Start" << std::endl;

推荐答案

可以同时执行OpenCL内核&参数设置.尝试使用其他内核对象.

There is possiblity of simultaneous execution of OpenCL kernel & arguments setting. Try to use different kernel objects.

这篇关于clEnqueueNDRangeKernel阻止执行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆