计算GPU上的平方欧几里德距离矩阵 [英] Calculate squared Euclidean distance matrix on GPU

查看:207
本文介绍了计算GPU上的平方欧几里德距离矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

p 是第一组位置的矩阵,其中每一行给出特定点的坐标。类似地,让 q 是第二组位置的矩阵,其中每行给出特定点的坐标。



然后,欧式距离成对平方公式为:

  k i,j)=(p(i,:) -q(j,:))*(p(i,:) -q(j,:)),
/ pre>

其中 p(i,:)表示 i p 行, p'表示 p



我想在支持CUDA的GPU(NVidia Tesla)上计算矩阵 k )在C ++中。我有OpenCV v.2.4.1与GPU支持,但我对其他选择,如Thrust图书馆。但是,我不太熟悉GPU编程。你能建议一个有效的方式来完成这个任务吗?我应该使用什么C ++库?

解决方案

问题看起来很简单,使图书馆过度使用。



不知道 i j 的范围,我建议你分区 k 为每个块中32个线程的倍数的块,计算

  float sum,myp [ d]。 
int i = blockIdx.x * blockDim.x + threadIdx.x;
for(int kk = 0; kk myp [kk] = p(i,kk)
for(j = blockIdx.y * blockDim.y; j <(blockIdx.y + 1)* blockDim; j ++){
#pragma unroll
for(sum = 0.0f, int kk = 0; kk temp = myp [kk] -q(j,kk);
sum + = temp * temp;
}
k(i,j)= sum;
}

其中我假设你的数据 d ,并且 p(i,k) q(j,k) code> k(意味着访问一个二维数组,我也假设你的数据是类型 float`)。



注意,根据 k 的存储方式,例如row-major或column-major,你可能需要循环 i 每个线程,以获得合并写入 k


Let p be a matrix of first set of locations where each row gives the coordinates of a particular point. Similarly, let q be a matrix of second set of locations where each row gives the coordinates of a particular point.

Then formula for pairwise squared Euclidean distance is:

k(i,j) = (p(i,:) - q(j,:))*(p(i,:) - q(j,:))', 

where p(i,:) denotes i-th row of matrix p, and p' denotes the transpose of p.

I would like to compute matrix k on CUDA-enabled GPU (NVidia Tesla) in C++. I have OpenCV v.2.4.1 with GPU support but I'm open to other alternatives, like Thrust library. However, I'm not too familiar with GPU programming. Can you suggest an efficient way to accomplish this task? What C++ libraries should I use?

解决方案

The problem looks simple enough to make a library overkill.

Without knowing the range of i and j, I'd suggest you partition k into blocks of a multiple of 32 threads each and in each block, compute

float sum, myp[d];
int i = blockIdx.x*blockDim.x + threadIdx.x;
for ( int kk = 0 ; kk < d ; kk++ )
    myp[kk] = p(i,kk);
for ( j = blockIdx.y*blockDim.y ; j < (blockIdx.y+1)*blockDim ; j++ ) {
    #pragma unroll
    for ( sum = 0.0f , int kk = 0 ; kk < d ; kk++ ) {
        temp = myp[kk] - q(j,kk);
        sum += temp*temp;
        }
    k(i,j) = sum;
    }

where I am assuming that your data has d dimensions and writing p(i,k), q(j,k) and k( to mean an access to a two-dimensional array. I also took the liberty in assuming that your data is of typefloat`.

Note that depending on how k is stored, e.g. row-major or column-major, you may want to loop over i per thread instead to get coalesced writes to k.

这篇关于计算GPU上的平方欧几里德距离矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆