计算GPU上的平方欧几里德距离矩阵 [英] Calculate squared Euclidean distance matrix on GPU
问题描述
让 p
是第一组位置的矩阵,其中每一行给出特定点的坐标。类似地,让 q
是第二组位置的矩阵,其中每行给出特定点的坐标。
然后,欧式距离成对平方公式为:
k i,j)=(p(i,:) -q(j,:))*(p(i,:) -q(j,:)),
/ pre>
其中
p(i,:)
表示i $ c $
p
的行, p'
表示p
。
我想在支持CUDA的GPU(NVidia Tesla)上计算矩阵
k
)在C ++中。我有OpenCV v.2.4.1与GPU支持,但我对其他选择,如Thrust图书馆。但是,我不太熟悉GPU编程。你能建议一个有效的方式来完成这个任务吗?我应该使用什么C ++库?解决方案问题看起来很简单,使图书馆过度使用。
不知道
i
和j
的范围,我建议你分区k
为每个块中32个线程的倍数的块,计算float sum,myp [ d]。
int i = blockIdx.x * blockDim.x + threadIdx.x;
for(int kk = 0; kkmyp [kk] = p(i,kk)
for(j = blockIdx.y * blockDim.y; j <(blockIdx.y + 1)* blockDim; j ++){
#pragma unroll
for(sum = 0.0f, int kk = 0; kktemp = myp [kk] -q(j,kk);
sum + = temp * temp;
}
k(i,j)= sum;
}
其中我假设你的数据
d
,并且p(i,k)
,q(j,k)
code> k(意味着访问一个二维数组,我也假设你的数据是类型 float`)。
注意,根据
k
的存储方式,例如row-major或column-major,你可能需要循环i
每个线程,以获得合并写入k
。Let
p
be a matrix of first set of locations where each row gives the coordinates of a particular point. Similarly, letq
be a matrix of second set of locations where each row gives the coordinates of a particular point.Then formula for pairwise squared Euclidean distance is:
k(i,j) = (p(i,:) - q(j,:))*(p(i,:) - q(j,:))',
where
p(i,:)
denotesi
-th row of matrixp
, andp'
denotes the transpose ofp
.I would like to compute matrix
k
on CUDA-enabled GPU (NVidia Tesla) in C++. I have OpenCV v.2.4.1 with GPU support but I'm open to other alternatives, like Thrust library. However, I'm not too familiar with GPU programming. Can you suggest an efficient way to accomplish this task? What C++ libraries should I use?解决方案The problem looks simple enough to make a library overkill.
Without knowing the range of
i
andj
, I'd suggest you partitionk
into blocks of a multiple of 32 threads each and in each block, computefloat sum, myp[d]; int i = blockIdx.x*blockDim.x + threadIdx.x; for ( int kk = 0 ; kk < d ; kk++ ) myp[kk] = p(i,kk); for ( j = blockIdx.y*blockDim.y ; j < (blockIdx.y+1)*blockDim ; j++ ) { #pragma unroll for ( sum = 0.0f , int kk = 0 ; kk < d ; kk++ ) { temp = myp[kk] - q(j,kk); sum += temp*temp; } k(i,j) = sum; }
where I am assuming that your data has
d
dimensions and writingp(i,k)
,q(j,k)
andk( to mean an access to a two-dimensional array. I also took the liberty in assuming that your data is of type
float`.Note that depending on how
k
is stored, e.g. row-major or column-major, you may want to loop overi
per thread instead to get coalesced writes tok
.这篇关于计算GPU上的平方欧几里德距离矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!