为什么那些谷歌的图像处理样品Renderscript中的Nexus 5运行速度较慢的GPU [英] Why does those Google image processing sample Renderscript runs slower on GPU in Nexus 5

查看：200 发布时间：2015/12/2 22:31:48 android gpgpu renderscript

本文介绍了为什么那些谷歌的图像处理样品Renderscript中的Nexus 5运行速度较慢的GPU的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我要感谢斯蒂芬在previous后很快的回复。这是一个后续问题，这个岗位<一个href="http://stackoverflow.com/questions/20381691/why-very-simple-renderscript-runs-3-times-slower-in-gpu-than-in-cpu">Why很简单Renderscript的运行速度比在CPU 的3倍慢于GPU

I'd like to thank Stephen for the very quick reply in a previous post. This is a follow up question for this post Why very simple Renderscript runs 3 times slower in GPU than in CPU

我的开发平台如下：

Development OS: Windows 7 32-bit
Phone: Nexus 5
Phone OS version: Android 4.4
SDK bundle: adt-bundle-windows-x86-20131030
Build-tool version: 19
SDK tool version: 22.3
Platform tool version: 19

为了评估Renderscript GPU计算性能和把握使得code更快Renderscript的普遍伎俩，我做了以下的测试。

In order to evaluate the performance of Renderscript GPU compute and to grasp the general trick of making code faster by Renderscript, I did the following test.

我检查了来自谷歌的Android开源项目的code，使用标签Android的4.2.2_r1.2。我用这个标签只是因为ImageProcessing测试样品是不是在较新的版本可用。

I checked out the code from Google's android open source project, using tag android-4.2.2_r1.2 . I used this tag simply because the ImageProcessing test sample is not available in the newer version.

然后我用在项目的基地\测试\ RenderScriptTests \ ImageProcessing在测试中。我录的运行code性能的GPU和CPU，性能下面列出。

Then I used the project under "base\tests\RenderScriptTests\ImageProcessing" in the test. I recorded the performance of running code on GPU as well CPU and the performance is listed below.

                         GPU    CPU
Levels Vec3 Relaxed     7.45ms  14.89ms
Levels Vec4 Relaxed     6.04ms  12.85ms
Levels Vec3 Full        N/A     28.97ms
Levels Vec4 Full        N/A     35.65ml
Blur radius 25          203.2ms 245.60ms
Greyscale               7.16ms  11.54ms
Grain                   33.33ms 21.73ms
Fisheye Full            N/A     51.55ms
Fisheye Relaxed         92.90ms 45.34ms
Fisheye Approx Full     N/A     51.65ms
Fisheye Approx Relaxed  93.09ms 39.11ms
Vignette Full           N/A     44.17ms
Vignette Relaxed        8.02ms  46.68ms
Vignette Approx Full    N/A     45.04ms
Vignette Approx Relaxed 8.20ms  43.69ms
Convolve 3x3            37.66ms 16.81ms
Convolve 3x3 Intrinsics N/A     4.57ms
ColorMatrix             5.87ms  8.26ms
ColorMatrix Intrinsics  N/A     2.70ms
ColorMatrix Intinsics Grey  N/A 2.52ms
Copy                    5.59ms  2.40ms
CrossProcess(using LUT) N/A     5.74ms
Convolve 5x5            84.25ms 46.59ms
Convolve 5x5 Intrinsics N/A     9.69ms
Mandelbrot              N/A     50.2ms
Blend Intrinsics        N/A     21.80ms

的N / A在表中是由任一满precision或RS内在不上GPU上运行。我们可以看到，其中在GPU上运行13的算法，其中6运行速度较慢的GPU。由于这种code是由谷歌，我会考虑这个现象有点值得研究。至少，我想在code运行速度更快的GPU我从 Renderscript，看到了GPU 不会在这里举行。

我调查了一些在列表中的算法，我想提两种。

I investigated some of the algorithms in the list, I'd like to mention two.

在暗角，对GPU的性能要好得多，我发现这是用在rs_cl.rsh调用多种功能。如果我注释掉那些功能，CPU的运行速度更快（见上面我的previous问题的一个极端的例子）。所以，问题是为什么会这样。在rs_cl.rsh，大部分功能是数学相关的，如EXP，日志，COS等为什么这种函数运行速度快了很多的GPU，这是因为这些功能的实现实际上是高并行或者仅仅因为版本执行在GPU上运行，优于版本的CPU运行？

In Vignette, the performance on GPU is much better, I found this was used by invoking several functions within rs_cl.rsh. If I comment out those functions, CPU will run faster (see my previous question on the top for an extreme case). So the question is why this happens. In rs_cl.rsh, most of the functions are math related, e.g. exp, log, cos, etc. Why such function runs a lot faster on GPU, is this because the implementation of those functions are actually high paralleled or just because the implementation of the version runs on GPU is better than the version runs on CPU?

另一个例子是conv3x3和conv5x5。尽管还有一些其他更聪明的实现比谷歌的版本在本次测试的应用程序，我想这个实现由谷歌肯定是不坏的。它试图尽量减少加法运算和使用了一些便利的功能，从rs_cl.rsh如convert_float4（）。因此，一目了然，我认为它会在GPU上运行得更快。但是，它运行速度慢了很多（在Nexus 4和5两者都使用高通的GPU）。我认为，这个例子非常重presentative因为在实现中，算法需要访问像素附近的当前像素。这样的操作是，在许多图像处理算法相当普遍。如果像二维卷积的实现不能进行GPU的速度更快，我怀疑是很多其他的算法也遭受同样的。这将是非常美联社preciated如果你能找出问题所在并提出了一些办法，使这样的算法要快。

Another example is conv3x3 and conv5x5. Though there're other more clever implementation than Google's version in this test app, I think this implementation by Google is certainly not bad. It tries to minimize the addition operation and uses some facilitation function from rs_cl.rsh such as convert_float4(). So at a glance, I assume it will run faster on GPU. However, it runs a lot slower (on Nexus 4 and 5 both using Qualcomm's GPU). I think this example is very representative since in the implementation, the algorithm needs to access the pixels near the current pixel. Such operation is quite common in many image processing algorithms. If the implementation like 2D convolution can't be made faster in GPU, I suspect there're many other algorithms would suffer the same. It would be highly appreciated if you can identify where the problem is and suggest some ways to make such algorithms faster.

在更普遍的问题是，由于测试结果我发现，我想问什么样的准则人应遵循以获得更高的性能，避免尽可能多的性能下降。毕竟，绩效目标是Renderscript的第二个最重要的目标，我认为RS的便携性还是比较不错的。

The more general question is, given the test result I showed, I'd like to ask what kind of criterions people should follow to get the higher performance and avoid the performance degradation as much as possible. After all, the goal of performance is the second most important goal of Renderscript and I think the portability of RS is quite good.

感谢您！

为什么那些谷歌的图像处理样品Renderscript中的Nexus 5运行速度较慢的GPU [英] Why does those Google image processing sample Renderscript runs slower on GPU in Nexus 5

问题描述

推荐答案

相关文章

移动开发最新文章

热门教程

热门工具

登录关闭

为什么那些谷歌的图像处理样品Renderscript中的Nexus 5运行速度较慢的GPU [英] Why does those Google image processing sample Renderscript runs slower on GPU in Nexus 5

问题描述

推荐答案

相关文章

移动开发最新文章

热门教程

热门工具

登录 关闭

登录关闭