对于小数据集，为什么使用cula(dgesv)求解线性方程组比mkl(dgesv)慢 [英] Why is solving system of linear equations using cula(dgesv) slower than mkl (dgesv) for small data sets

查看：83 发布时间：2021/4/27 20:13:11 cuda gpgpu intel-mkl cula

本文介绍了对于小数据集，为什么使用cula(dgesv)求解线性方程组比mkl(dgesv)慢的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我已经编写了CUDA C和C程序，以使用CULA例程dgesv和MKL例程dgesv求解矩阵方程Ax = b.对于较小的数据集，CPU程序似乎比GPU程序快.但是随着数据集增加到500个以上，GPU克服了CPU的困扰.我正在使用配有i3 CPU和Geforce 525M GPU的戴尔笔记本电脑.对于GPU最初的缓慢性能的最佳解释是什么?

I have written a CUDA C and C program to solve a matrix equation Ax=b using CULA routine dgesv and MKL routine dgesv. It seems like for a small data set, the CPU program is faster than the GPU program. But the GPU overcomes the CPU as the data set increases past 500. I am using my dell laptop which has i3 CPU and Geforce 525M GPU. What is the best explanation for the initial slow performance of the GPU?

我编写了另一个程序，该程序接受两个向量，将它们相乘并相加.就像点积一样，只是结果是矢量和而不是标量.在此程序中，即使对于较小的数据集，GPU也比CPU更快.我正在使用同一个笔记本.为什么与上面解释的数据集相比，即使对于较小的数据集，此程序中的GPU为何速度也更快?是因为求和中没有太多计算吗?

I wrote another program which takes two vectors, multiplies them and add the result. This is just like the dot product just that the result is a vector sum not a scalar. In this program, the GPU is faster than the CPU even for small data set. I am using the same notebook. Why is the GPU faster in this program even for small data set as compared to the one explained above? Is it because there is not much computation involved in the summation?

推荐答案

与大型数据集相比，GPU在小型数据集上的吸引力降低的情况并不少见.这样做的原因将取决于特定的算法.GPU通常具有比CPU更高的主内存带宽，并且在繁重的数字运算方面通常也能胜过它们.但是，GPU通常仅在问题固有的并行性可以暴露的情况下才能正常工作.利用这种并行性，算法可以利用更大的内存带宽和更高的计算能力.

It's not uncommon for GPUs to be less interesting on small data sets as compared to large data sets. The reasons for this will vary depending on the specific algorithm. GPUs generally have a higher main memory bandwidth than CPUs and also can usually outperform them for heavy-duty number crunching. But GPUs usually only work well when there is parallelism inherent in the problem, which can be exposed. Taking advantage of this parallelism allows an algorithm to tap into the greater memory bandwidth as well as the higher compute capability.

但是，在GPU可以执行任何操作之前，有必要将数据发送到GPU.这就给GPU版本的代码造成了成本"，而这些成本通常不会出现在CPU版本中.

However, before the GPU can do anything, it's necessary to get the data to the GPU. And this creates a "cost" to the GPU version of the code that will not normally be present in the CPU version.

更准确地说，当GPU上的计算时间(通过CPU)的减少超过数据传输的成本时，GPU将提供好处.我认为求解线性方程组的复杂度介于O(n ^ 2)和O(n ^ 3)之间.对于非常小的n，此计算复杂度可能不足以抵消数据传输的成本.但是很明显，随着n变大，它应该变大.另一方面，您的向量运算只能是O(n)复杂度.因此，收益情况将有所不同.

To be more precise, the GPU will provide a benefit when the reduction in computation time on the GPU (over the CPU) exceeds the cost of the data transfer. I believe that solving a system of linear equations is somewhere between O(n^2) and O(n^3) complexity. For very small n, this computational complexity may not be large enough to offset the cost of data transfer. But clearly as n becomes larger it should. On the other hand your vector operation may only be O(n) complexity. So the benefit scenario will look different.

对于O(n ^ 2)或O(n ^ 3)情况，随着我们移至更大的数据集，传输数据的成本"随着O(n)的增加而增加，但是解决方案的计算需求却增加了为O(n ^ 2)(或O(n ^ 3)).因此，较大的数据集应按指数形式具有较大的计算工作量，从而减少数据传输成本"的影响.另一方面，O(n)问题可能不会具有这种缩放动态性.工作量的增长速度与数据传输的成本"相同.

For the O(n^2) or O(n^3) case, as we move to larger data sets, the "cost" to transfer the data increases as O(n), but the compute requirements for solution increase as O(n^2) (or O(n^3)). Therefore larger data sets should have exponentially larger compute workloads, reducing the effect of the "cost" of the data transfer. An O(n) problem on the other hand, probably won't have this scaling dynamic. The workload increases at the same rate as the "cost" of data transfer.

还要注意，如果可以通过与计算工作重叠来隐藏将数据传输到GPU的成本"，则重叠部分的成本"将变为免费"，即，它不会对整体解决方案有所贡献时间.

Also note that if the "cost" of transferring data to the GPU can be hidden by overlapping it with computation work, then the "cost" for the overlapped portion becomes "free", i.e. it does not contribute to the overall solution time.

这篇关于对于小数据集，为什么使用cula(dgesv)求解线性方程组比mkl(dgesv)慢的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

对于小数据集，为什么使用cula(dgesv)求解线性方程组比mkl(dgesv)慢 [英] Why is solving system of linear equations using cula(dgesv) slower than mkl (dgesv) for small data sets

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

对于小数据集，为什么使用cula(dgesv)求解线性方程组比mkl(dgesv)慢 [英] Why is solving system of linear equations using cula(dgesv) slower than mkl (dgesv) for small data sets

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭