6核心处理器可以克服显卡吗? [英] Could a 6 core processor overcome graphics boards?

查看:201
本文介绍了6核心处理器可以克服显卡吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我做了一个VS2013项目来测试opencl在github上的opencl目录:



GitHub - jlopez2022 / cpp_utils:c ++程序示例 [ ^ ]





在这个例子中,我计算了一个大矢量(200mega大小)的差分rms,然后是它计算的CPU和调试模式为100 Megaops / data



在CPU和发布模式下,微积分约为400 Mops / data(所以我说它使用了4个核心平行)。



然后我也检查了GPU并获得了600个Mops /数据



所以理论上如果我应该使用6核CPU,我应该克服GPU处理,除非CPU-GPU带宽增加



CPU是4核E5 3.5Ghz
GPU是Radeon R9 390,拥有2560个核心和1Ghz



理论上GPU是182次fas比CPU更糟糕但我不幸的是CPU需要大量时间将数据复制到GPU内存



我尝试过:



GitHub - jlopez2022 / cpp_utils: c ++程序 [ ^ ]

I have done a VS2013 project to test opencl at github OpenCL dir:

GitHub - jlopez2022/cpp_utils: Example of c++ programs[^]


In that example I calculated differential rms of a big vector (200mega size), then on CPU and debug mode it calculated at 100 Megaops/data

At CPU and release mode the calculus was about 400 Mops/data (so I supose it used the 4 cores in parallel).

Then I checked also on GPU and obtained 600 Mops/data

So in theory if I should use a 6 core cpu I should overcome the GPU processing unless CPU-GPU bandwith wuld be increased

The CPU was a 4 core E5 3.5Ghz
The GPU was a Radeon R9 390 with 2560 cores and 1Ghz

In theory the GPU is 182 times faster than the CPU but I supose unfortunately CPU needs loads of time to copy data to GPU memory

What I have tried:

GitHub - jlopez2022/cpp_utils: Example of c++ programs[^]

推荐答案

释放模式下的倍速不是由并行处理提供的。它由编译器在发布模式下优化代码并省略在调试版本中完成的其他检查来源。



您必须显式编写并行处理代码。 br />


一般来说,哪种方法最快(具有并行处理或GPU代码的CPU代码)无法解决。唯一可靠的方法是实现两者并比较结果。但这取决于所使用的硬件(CPU / GPU,线程数和内核数),因此不同系统的结果不同。



对于这种情况,你仍然可以实现两者并提供用户选项以选择方法或执行简短测试以让您的应用程序选择最快的方法。
The speed doubling in release mode is not sourced by parallel processing. It is sourced by the compiler optimising the code in release mode and omitting additional checks which are done in debug builds.

You have to explicitly write code for parallel processing.

Which method is finally faster (CPU code with parallel processing or GPU code) can't be answered in general. The only reliable method is to implement both and compare the results. But it depends on the used hardware (CPU / GPU, number of threads and cores) so that the results are different for different systems.

For such cases you can still implement both and provide an user option to select the method or perform short tests to let your application choose the fastest method.


这篇关于6核心处理器可以克服显卡吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆