OpenCL是否支持随机访问的全局队列缓冲区? [英] Does OpenCL support a randomly accessed global queue buffer?

查看:84
本文介绍了OpenCL是否支持随机访问的全局队列缓冲区?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个处理组合数据的内核.由于这些类型的问题通常具有很大的问题空间,其中大多数已处理的数据都是垃圾,因此有什么方法可以执行以下操作:

I am writing a kernel which processes combinatorial data. Because these sorts of problems generally have a large problem space, where most of the processed data is junk, is there a way I could do the following:

(1)如果计算出的数据通过某种条件,则将其放入全局输出缓冲区.

(1) If the calculated data passes some sort of condition, it is put onto a global output buffer.

(2)输出缓冲区装满后,数据将发送回主机

(2) Once the output buffer is full, the data is sent back to the host

(3)主机从缓冲区中获取数据的副本并将其清除

(3) The host takes a copy of the data from the buffer and clears it

(4)然后创建一个要由GPU填充的新缓冲区

(4) Then creates a new buffer to be filled by the GPU

为简单起见,此示例可以表示为选择性内积,我的意思是

For simplicity, this example could be stated as a selective inner product and I mean that by

__global int buffer_counter; // Counts 

void put_onto_output_buffer(float value, __global float *buffer, int size)
{
    // Put this value onto the global buffer or send a signal to the host
}

__kernel void
inner_product(
    __global const float *threshold,       // threshold
    __global const float *first_vector,    // 10000 float vector
    __global const float *second_vector,   // 10000 float vector
    __global float *output_buffer,         // 100 float vector
    __global const int *output_buffer_size // size of the output buffer -- 100
{
    int id = get_global_id(0);
    float value = first_vector[id] * second_vector[id];
    if (value >= threshold[0])
        put_onto_output_buffer(value, output_buffer, output_buffer_size[0]); 
}

推荐答案

这取决于输出的频率.如果频率很高(一个工作项会经常写入输出),那么buffer_counter将成为争用的源头,并且会导致速度变慢(顺便说一句,它也需要使用原子方法进行更新,这是为什么这么慢).在这种情况下,最好只写输出,然后再对真实的输出进行排序.

It depends on the frequency of output. If it is high frequency (a work item writes output more often than not) then buffer_counter will be a source of contention and will cause slow downs (also, by the way, it will need to be updated using atomic methods, which is why it's slow). It this case you're better off just always writing output and sort through the real ones later.

另一方面,如果很少输出输出,那么使用原子位置指示符是很有意义的.大多数工作项将进行计算,确定它们没有输出,然后退休.只有很少有输出的输出才会争用原子输出位置索引,对其进行串行递增,然后将其输出写入其唯一位置.您的输出存储器将紧凑地包含结果(不按特定顺序排列,因此请在需要时存储工作项ID).

On the other hand, if writing output is fairly infrequent, then using an atomic position indicator makes good sense. The majority of work items will do their computation, decide they have no output, and retire. Only the infrequent ones that have output will contend over the atomic output position index, serially increment it, and write their output at their unique location. Your output memory will compactly contain the results (in no particular order so store the work item ID if you care).

再次,要读原子,因为索引必须是原子的.

Again, do read up on atomics because the index needs to be atomic.

这篇关于OpenCL是否支持随机访问的全局队列缓冲区?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆