线程opencl编译 [英] Threading opencl compiling

查看:113
本文介绍了线程opencl编译的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

[更新:]我现在产生了多个进程,尽管基本的线程问题仍然存在,但它运行得很好. [/]

[Update:] I'm spawning multiple processes now and it works fairly well, though the basic threading problem still exists. [/]

我正在尝试对编译一堆opencl内核的c ++(g ++ 4.6.1)程序进行线程化.大部分时间都花在clBuildProgram中. (这是基因编程,实际上运行代码并评估适用性要快得多.)我正在尝试对这些内核的编译进行线程化,到目前为止还算不上什么运气.此时,线程之间没有共享数据(除了具有相同的平台和设备引用之外),但一次只能运行一个线程.我可以将此代码作为多个进程来运行(只需在linux中的不同终端窗口中启动它们),然后它将用尽多个内核,而不是在一个进程中使用.我可以仅使用基本数学就可以使用具有相同基本线程代码(std :: thread)的多个内核,因此我认为这与opencl编译或一些我遗忘的静态数据有关. :) 有任何想法吗?我已尽力使此线程安全,所以我很困惑.

I'm trying to thread a c++ (g++ 4.6.1) program that compiles a bunch of opencl kernels. Most of the time taken is spent inside clBuildProgram. (It's genetic programming and actually running the code and evaluating fitness is much much faster.) I'm trying to thread the compilation of these kernels and not having any luck so far. At this point, there's no shared data between threads (aside from having the same platform and device reference), but it will only run one thread at a time. I can run this code as several processes (just launching them in different terminal windows in linux) and it will then use up multiple cores but not within one process. I can use multiple cores with the same basic threading code (std::thread) with just basic math, so I think it's something to do with either the opencl compile or some static data I forgot about. :) Any ideas? I've done my best to make this thread-safe, so I'm stumped.

我正在使用AMD的SDK(opencl 1.1,大约6/13/2010)和5830或5850来运行它. SDK和g ++并不是最新的.上一次我安装较新的linux发行版以获取较新的g ++时,我的代码以一半的速度运行(至少opencl编译器运行过),所以我回去了. (仅检查了该安装程序中的代码,它仍然以一半的速度运行,而没有线程差异.)此外,当我说一次仅运行一个线程时,它将启动所有线程,然后在两个线程之间交替,直到它们完成为止,然后执行下两个,依此类推.看起来所有线程都在运行,直到代码开始构建程序为止.我没有在clBuildProgram中使用回调函数.我意识到这里可能有很多错误,没有代码很难说. :)

I'm using AMD's SDK (opencl 1.1, circa 6/13/2010) and a 5830 or 5850 to run it. The SDK and g++ are not as up to date as they could be. The last time I installed a newer linux distro in order to get the newer g++, my code was running at half speed (at least the opencl compiles were), so I went back. (Just checked the code on that install and it runs at half speed still with no threading differences.) Also, when I said it only runs one thread at a time, it will launch all of them and then alternate between two until they finish, then do the next two, etc. And it does look like all of the threads are running until the code gets to building the program. I'm not using a callback function in clBuildProgram. I realize there's a lot that could be going wrong here and it's hard to say without the code. :)

我很确定此问题发生在clBuildProgram的内部或调用中.我正在打印此处花费的时间,被推迟的线程将以很长的编译时间返回以进行第一次编译.这些clBuildProgram调用之间唯一共享的数据是设备ID,因为每个线程的cl_device_id具有相同的值.

I am pretty sure this problem occurs inside of or in the call of clBuildProgram. I'm printing the time taken inside of here and the threads that get postponed will come back with a long compile time for their first compile. The only shared data between these clBuildProgram calls is the device id, in that each thread's cl_device_id has the same value.

这是我启动线程的方式:

This is how I'm launching the threads:

    for (a = 0; a < num_threads; a++) {
        threads[a] = std::thread(std::ref(programs[a]));        
        threads[a].detach();
        sleep(1);    // giving the opencl init f()s time to complete
    }

这是它陷入困境的地方(尽管设备ID相同,这些都是要传递的所有局部变量):

This is where it's bogging down (and these are all local variables being passed, though the device id will be the same):

    clBuildProgram(program, 1, & device, options, NULL, NULL);

每个线程是否具有唯一的上下文或command_queue似乎没有什么区别.我真的怀疑这是问题所在,这就是为什么我提到它. :)

It doesn't seem to make a difference whether each thread has a unique context or command_queue. I really suspected this was the problem which is why I mention it. :)

更新:为此可以使用fork()生成子进程.

Update: Spawning child processes with fork() will work for this.

推荐答案

您可能想在AMD的支持论坛上发布有关此问题的内容.考虑到规范要求的许多失败的关于线程一致性的OpenGL实现,令我感到惊讶的是,OpenCL驱动程序在这种意义上仍然不是最佳的.谁知道,他们可以在内部使用进程ID来分隔数据.

You might want to post something on AMD's support forum about that. Considering the many failed OpenGL implementations about thread consistency that the spec requires, it would not surprise me that OpenCL drivers are still suboptimal on that sense. They could use process ID internally to separate data instead, who knows.

如果您的多处理世代正在工作,那么我建议您保留该世代,并使用IPC传达结果.您都可以使用boost :: ipc,它具有使用序列化的有趣方式(例如,通过boost :: spirit反映数据结构).或者,您可以使用posix管道或共享内存,或者只是将编译结果转储到文件中,并使用boost :: filesystem和目录迭代器从父进程中轮询目录.

If you have a working multi processed generation, then I suggest you keep that, and communicate results using IPC. Either you can use boost::ipc which has interesting ways of using serialization (e.g with boost::spirit to reflect the data structures). Or you could use posix pipes, or shared memory, or just dump compilation results to files and poll the directory from your parent process, using boost::filesystem and directory iterators...

分叉的进程可能会继承一些句柄;因此,我相信也存在使用未命名管道的方法,这可以帮助您避免创建实例化客户端管道的管道服务器,从而导致大量协议编码.

Forked processes may inherit some handles; so there are ways to use unnamed pipes as well I believe, that could help you into avoiding the need to create a pipe server that would instantiate client pipes, which can lead to extensive protocol coding.

这篇关于线程opencl编译的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆