为什么谷歌选择RenderScript代替的OpenCL [英] Why did Google choose RenderScript instead of OpenCL

查看:810
本文介绍了为什么谷歌选择RenderScript代替的OpenCL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在想,如果有可能使用的OpenCL为Android,发现这是不可能的,而完全放弃了主题。 但由于从1月14日在官方Android开发者博客(http://android-developers.blogspot.fr/2013/01/evolution-of-renderscript-performance.html)的博客文章,我发现,并行编程是可能的由于Android 4.0,由于到renderScript!有相当多的共同特性的OpenCL的API。

I've been wondering if it was possible to use OpenCL for Android, find out that it wasn't possible, and dropped the subject altogether. But thanks to the blog post from january 14th on the official Android Developer blog (http://android-developers.blogspot.fr/2013/01/evolution-of-renderscript-performance.html), I discovered that parallel programming was possible since Android 4.0, thanks to RenderScript ! An API that has quite a few common features with OpenCL.

什么我现在想知道的是:没有谷歌为什么要选择,而不是推动OpenCL的前锋(开放式规范,现在由Khronos Group处理)实施这一新的解决方案,

What I'm wondering now is : why did Google choose to implement this new solution, instead of pushing OpenCL forward (an open specification now handled by the Khronos group).

我的意思是,我知道,这不是真的很难从一个转换到另一个,但仍...

I mean, I know, it's not really hard to convert from one to the other, but still...

总之,如果有人为真正的解释,请让我知道!

Anyway, if anyone as the real explanation, please let me know !

推荐答案

答案是,Android的需求比OpenCL的尝试提供了非常不同的。

The answer is that Android's needs are very different than what OpenCL tries to provide.

OpenCL的使用执行模型首次在CUDA介绍。在此模型中,内核由一个或工人的多组,每个组有快速该组内共享存储器和同步原语。这样做是什么导致算法的描述与怎么说的算法应该安排在一个特定的架构是混合的(因为你决定一个组的规模以及何时该组中的同步)。

OpenCL uses the execution model first introduced in CUDA. In this model, a kernel is made up of one or many groups of workers, and each group has fast shared memory and synchronization primitives within that group. What this does is cause the description of an algorithm to be intermingled with how that algorithm should be scheduled on a particular architecture (because you're deciding the size of a group and when to synchronize within that group).

这是伟大的,当你写一个建筑,你要绝对峰值性能,但它得到的峰值性能,在性能便携性为代价。也许在你的架构,你有足够的寄存器和共享内存来运行256工人每组获得最佳性能,但在另一个建筑,你最终会进行大规模的寄存器溢出与任何上述职工128人,每组,造成了80%的性能回归。同时,因为你的code被明确写入对256工人每个组,运行时不能做任何事情来试图改善在其他架构的情况 - 它必须服从你写什么。从体系结构GPU计算的台式机/ HPC侧移动到体系结构时,这种情况是常见的。

That's great when you're writing for one architecture and you want absolute peak performance, but it gets peak performance at the expense of performance portability. Maybe on your architecture, you have enough registers and shared memory to run 256 workers per group for best performance, but on another architecture, you'd end up with massive register spills with anything above 128 workers per group, causing an 80% performance regression. Meanwhile, because your code is written explicitly for 256 workers per group, the runtime can't do anything to try to improve the situation on another architecture--it has to obey what you've written. This sort of situation is common when moving from architecture to architecture on the desktop/HPC side of GPU compute.

在手机,Android的需要许多不同的GPU和CPU厂商之间的性能便携性非常不同的架构。如果机器人要依靠CUDA式的执行模型,这将是几乎不可能编写一个内核,并让它在各种设备上运行可以接受的。

On mobile, Android needs performance portability between many different GPU and CPU vendors with very different architectures. If Android were to rely on a CUDA-style execution model, it would be almost impossible to write a single kernel and have it run acceptably on a range of devices.

RenderScript摘要控制;但是,我们在不断缩小差距在什么是可能与RenderScript条款。例如,ScriptGroup,在安卓4.2引入了一个API,是我们的计划,以进一步提高GPU code一代的重要组成部分。有很多新的改进,未来在,这使得快code更容易了。

RenderScript abstracts control over scheduling away from the developer at the cost of some peak performance; however, we're constantly closing the gap in terms of what's possible with RenderScript. For example, ScriptGroup, an API introduced in Android 4.2, is a big part of our plans to further improve GPU code generation. There are plenty of new improvements coming that make writing fast code even easier, too.

这篇关于为什么谷歌选择RenderScript代替的OpenCL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆