OpenCL中的浮点数的原子最大值 [英] Atomic max for floats in OpenCL

查看:235
本文介绍了OpenCL中的浮点数的原子最大值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在OpenCL中,我需要一个原子最大函数用于浮点数.这是我当前使用atomic_xchg的天真的代码

I need an atomic max function for floats in OpenCL. This is my current naive code using atomic_xchg

float value = data[index];
if ( value  > *max_value )
{
    atomic_xchg(max_value, value);
}

使用英特尔CPU时此代码给出正确的结果,但对于Nvidia GPU则不然.这段代码正确吗,或者有人可以帮助我吗?

This code gives the correct result when using an Intel CPU, but not for a Nvidia GPU. Is this code correct, or can anyone help me?

推荐答案

您可以这样做:

 //Function to perform the atomic max
 inline void AtomicMax(volatile __global float *source, const float operand) {
    union {
        unsigned int intVal;
        float floatVal;
    } newVal;
    union {
        unsigned int intVal;
        float floatVal;
    } prevVal;
    do {
        prevVal.floatVal = *source;
        newVal.floatVal = max(prevVal.floatVal,operand);
    } while (atomic_cmpxchg((volatile __global unsigned int *)source, prevVal.intVal, newVal.intVal) != prevVal.intVal);
}

__kernel mykern(__global float *data, __global float *max_value){
    unsigned int index = get_global_id(0);

    float value = data[index];
    AtomicMax(max_value, value);
}

LINK 中所述a>.

它的作用是创建float和int的并集.在浮点数上执行数学运算,但是在执行原子xchg时比较整数.只要整数匹配,就可以完成操作.

What it does is create a union of float and int. Perform the math on the float, but compare integers when doing the atomic xchg. As long as the integers match, the operation is completed.

但是,由于使用这些方法而导致的速度降低非常高.小心使用它们.

However, the speed decrease due to the use of these methods is very high. Use them carefully.

这篇关于OpenCL中的浮点数的原子最大值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆