OpenCL:比较在可用平台/设备上将两个整数数组相加所需的时间 [英] OpenCL: comparing the time required to add two arrays of integers on available platforms/devices

查看:72
本文介绍了OpenCL:比较在可用平台/设备上将两个整数数组相加所需的时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对整个OpenCL领域还很陌生,所以我在关注一些初学者教程.我正在尝试将 this 以比较在不同设备上将两个阵列加在一起所需的时间.但是,我得到的结果令人困惑.考虑到代码太长,我做了此GitHub Gist .

I'm very new to the whole OpenCL world so I'm following some beginners tutorials. I'm trying to combine this and this to compare the the time required to add two arrays together on different devices. However I'm getting confusing results. Considering that the code is too long I made this GitHub Gist.

在我的Mac上,我有1个平台和3个设备.当我在

On my mac I have 1 platform with 3 devices. When I assign the j in

cl_command_queue command_queue = clCreateCommandQueue(context, device_id[j], 0, &ret);

手动设置为0,似乎可以在CPU上运行计算(大约5.75秒).当放置1和2时,计算时间将大大减少(0.01076秒).我认为这是因为计算是在我的Intel或AMD GPU上运行的.但是然后有一些问题:

manually to 0 it seems to run the calculation on CPU (about 5.75 seconds). when putting 1 and 2 then calculation time drops drastically (0.01076 seconds). Which I assume is because the calculation is being ran on my Intel or AMD GPU. But Then there are some issues:

  1. 我可以将j调整为任何更高的数字,并且它似乎仍可以在GPU上运行.
  2. 当我将所有计算循环放置时,为所有设备测得的时间与在CPU上计算的时间相同(据我推测).
  3. 对所有0<j进行计算所需的时间可疑地接近.我想知道它们是否真的在不同的设备上运行.
  1. I can adjust the j to any higher numbers and it still seems to run on GPUs.
  2. When I put all the calculation in a loop, the time measured for all the devices are the same as calculating on CPU (as I persume).
  3. The time required to do the calculation for all 0<j are suspiciously close. I wonder if they are really being ran on different devices.

我对OpenCL毫无头绪,因此,如果您可以查看我的代码,让我知道我的错误是什么以及如何解决.或者,也许我将我引向一个很好的示例,该示例在不同的设备上运行计算并比较时间.

I have clearly no clue about OpenCL so I would appreciate if you could take a look at my code and let me know what are my mistake(s) and how I can solve it/them. Or maybe point me towards a good example which runs a calculation on different devices and compares the time.

PS 我也发布了此问题在Reddit中

推荐答案

在提交您遇到的问题的问题之前,请务必记住检查错误(特别是在这种情况下,每个API调用都返回CL_SUCCESS) .否则结果将毫无意义.

Before submitting a question for an issue you are having, always remember to check for errors (specifically, in this case, that every API call returns CL_SUCCESS). The results are meaningless otherwise.

在特定情况下,代码中的问题是,获取设备ID时,您只会得到一个设备ID(第60行,第三个参数),这意味着缓冲区是虚假的,j > 0的结果毫无意义.

In the specific case, the problem in your code is that when getting the device IDs, you're only getting one device ID (line 60, third argument), meaning that everything else in the buffer is bogus, and results for j > 0 are meaningless.

唯一令人惊讶的是它不会崩溃.

The only surprising thing is that it doesn't crash.

此外,在检查运行时时,请使用OpenCL事件,而不要使用主机端时钟时间.在您的情况下,您至少要在clFinish之后执行操作,因此可以确保内核执行终止,但实际上您是在计算所有设置所需的时间,而不只是复制时间.

Also, when checking runtimes, use OpenCL events, not host-side clock times. In your case you're at least doing after the clFinish, so you are ensuring that the kernel execution terminates, but you're essentially counting the time necessary for all the setup, rather than just the copy time.

这篇关于OpenCL:比较在可用平台/设备上将两个整数数组相加所需的时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆