费米建筑可能的解决方案对我的比较研究? [英] Fermi architecture possible solution to my comparative study?

查看:224
本文介绍了费米建筑可能的解决方案对我的比较研究?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开展一项比较研究,其中我必须对算法的串行和并行版本进行比较(NSGA-II算法要精确下载链接)。 NSGA-II是一种启发式优化方法,因此取决于生成的初始随机种群。如果使用CPU和GPU生成的初始种群不同,那么我不能进行公正的加速研究。



我拥有NVIDIA-TESLA-C1060卡,其计算能力为1.3。根据此版本这个NVIDIA文档,我们不能指望sm_13设备总是产生IEEE-754兼容浮点(单精度)值。换句话说,在我现有的设备上,我不能对与其串行对应的CUDA程序进行公正的加速研究。



我的问题是:切换到Fermi架构解决了这个问题?浮点运算会在不同的架构上产生不同的结果,无论它们是支持IEEE754还是支持IEEE754,不是,因为浮点不是关联的。即使在x86上切换编译器通常会给出不同的结果。本白皮书提供了一些很好的解释。



话虽如此,你的真正的问题是,你有一个依赖于数据的算法,其中的操作依赖于你生成的随机数。因此,如果在CPU和GPU上生成相同的数字,则两个运行将遵循相同的路径。考虑使用 cuRAND ,它可以在CPU和GPU上生成相同的数字。


I am working on a comparative study in which I have to make a comparison of the serial and parallel versions of an algorithm (NSGA-II algorithm to be precise download link here). NSGA-II is a heuristic optimization method and hence depends on the initial random population generated. If the initial populations generated using the CPU and the GPU are different, then I can not make an impartial speedup study.

I possess a NVIDIA-TESLA-C1060 card which has a compute capability of 1.3. As per this anwer and this NVIDIA document, we can't expect an sm_13 device to always yield an IEEE-754 compliant float (single precision) value. Which in other word means that on my current device, I cant conduct an impartial speedup study of the CUDA program corresponding to its serial counterpart.

My question is: Would switching to Fermi architecture solve the problem?

解决方案

Floating-point operations will yield different results on different architectures, regardless of whether they support IEEE754 or not, since floating-point is not associative. Even switching compiler on x86 will typically give different results. This whitepaper gives some excellent explanations.

Having said that, your real issue is that you have a data dependent algorithm where the operations are dependent on the random numbers you generate. So if you generate the same numbers on the CPU and the GPU then both runs will be following the same paths. Consider using cuRAND, which can generate the same numbers on both the CPU and GPU.

这篇关于费米建筑可能的解决方案对我的比较研究?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆