OpenCL速度和浮点精度 [英] OpenCL speed and float point precision

查看:1019
本文介绍了OpenCL速度和浮点精度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚刚开始使用OpenCL。但是,我发现了一些奇怪的OpenCl的行为,我不能理解。我建立并测试的来源是 http://www.codeproject .com / Articles / 110685 / Part-1-OpenCL-Portable-Parallelism 。我有一个ATI Radeon HD 4770和一个AMD Fx 6200 3.8 ghz 6核心cpu。

I have just started working with OpenCL. However, I have found some weird behavior of OpenCl, which i can't understand. The source i built and tested, was http://www.codeproject.com/Articles/110685/Part-1-OpenCL-Portable-Parallelism . I have a ATI Radeon HD 4770, and a AMD Fx 6200 3.8 ghz 6 core cpu.

首先
速度不是最大工作组项目数的线性。我运行应用程序分析器来分析在内核执行期间花费的时间。结果是有点令人震惊,我的GPU,每个组只能处理256个工作项,使用2.23008毫秒计算平方5079040数字。注意这是不考虑内核加载时间...

Firstly the speed is not linearly to the number of maximum work group items. I ran App profiler to analyze the time spent during the kernel execution. The result was a bit shocking, my GPU which can only handle 256 work items per group, used 2.23008 milliseconds to calculate square of 5079040 numbers. Note this was without considering the kernel loading time...

但是,我的cpu可以处理每组1024个工作项,使用13.41895毫秒计算数字。我认为工作组中的工作项是同时运行的,换句话说,cpu应该更快。我想知道什么,做工作组同时运行?像在我的设置,GPU将运行更多的工作组同时比CPU。

However, my cpu which can handle 1024 work items per group, used 13.41895 milliseconds to calculate the numbers. I thought that the work items in a work group are ran simultaneously, in other words the cpu should have been faster. What i want to know, do work group run simultaneously? Like, in my setup, the GPU would run more work groups simultaneously than the CPU.

另一个因素可能是GPU更快地计算浮点算术,但是我的CPU的时钟速度提高了4倍,所以还是很奇怪。
我知道通常GPU应该更快,当它给opencl,但我想要一个很好的解释为什么。

Another factor may be that the GPU is faster to calculate float arithmetics, but my cpu have 4 times faster clock speed, so still weird. I know that normally the GPU should be faster when it yields to opencl, but i want a good explanation for why.

编辑:我试图计算1024 ,2048 ... 5120工作项,现在的cpu比GPU快。所以我知道,CPU工作更好的几个工作时间,而GPU是最好的,当它是许多工作项目。

I tried to calculate 1024, 2048...5120 work items, and now the cpu was faster than the GPU. So i have learned that the CPU works better with few work times, while the GPU is best when it is many work items.

我还看到,是我的CPU每工作组大小(4096,6144,8192)的计算速度慢得多。所以看起来我的CPU需要同时三个工作组。

What i also saw, was that my CPU did the calculation much slower for every third times the work group size(4096, 6144, 8192). So it looks like my CPU takes three work groups simultaneously.

问题移至这里:
打开浮动点精度

提前感谢所有答案。

推荐答案


我想知道什么,工作组同时运行?比如,在我的设置中,GPU会同时运行更多的工作组比CPU。

What i want to know, do work group run simultaneously? Like, in my setup, the GPU would run more work groups simultaneously than the CPU.

这里有一个很好的答案:是否并行执行OpenCL工作项?

There is a great answer to that question here: Are OpenCL work items executed in parallel?

这篇关于OpenCL速度和浮点精度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆