如果USAGE_SHARED,Renderscript在启用GPU的驱动程序上失败 [英] Renderscript fails on GPU enabled driver if USAGE_SHARED

查看:71
本文介绍了如果USAGE_SHARED,Renderscript在启用GPU的驱动程序上失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在使用renderscript进行音频dsp处理.它很简单,并且可以大大提高我们的用例的性能.但是在启用了GPU执行的自定义驱动程序的设备上, USAGE_SHARED 会遇到一个令人烦恼的问题.

We are using renderscript for audio dsp processing. It is simple and improves performance significantly for our use-case. But we run into an annoying issue with USAGE_SHARED on devices that have custom driver with GPU execution enabled.

您可能知道, USAGE_SHARED 标志使渲染脚本分配可以重用给定的内存,而不必创建它的副本.因此,在我们的情况下,它不仅可以节省内存,而且可以将性能提高到所需的水平.

As you may know, USAGE_SHARED flag makes the renderscript allocation to reuse the given memory without having to create a copy of it. As a consequence, it not only saves memory, in our case, improves performance to desired level.

以下带有 USAGE_SHARED 的代码在默认渲染脚本驱动程序( libRSDriver.so )上可以正常工作.使用自定义驱动程序( libRSDriver_adreno.so ), USAGE_SHARED 不会重用给定的内存和数据.

The following code with USAGE_SHARED works fine on default renderscript driver (libRSDriver.so). With custom driver (libRSDriver_adreno.so) USAGE_SHARED does not reuse given memory and thus data.

这是利用 USAGE_SHARED 并调用renderscript内核的代码

This is the code that makes use of USAGE_SHARED and calls renderscript kernel

void process(float* in1, float* in2, float* out, size_t size) {
  sp<RS> rs = new RS();
  rs->init(app_cache_dir);

  sp<const Element> e = Element::F32(rs);
  sp<const Type> t = Type::create(rs, e, size, 0, 0);

  sp<Allocation> in1Alloc = Allocation::createTyped(
                rs, t,
                RS_ALLOCATION_MIPMAP_NONE, 
                RS_ALLOCATION_USAGE_SCRIPT | RS_ALLOCATION_USAGE_SHARED,
                in1);

  sp<Allocation> in2Alloc = Allocation::createTyped(
                rs, t,
                RS_ALLOCATION_MIPMAP_NONE, 
                RS_ALLOCATION_USAGE_SCRIPT | RS_ALLOCATION_USAGE_SHARED,
                in2);

  sp<Allocation> outAlloc = Allocation::createTyped(
                rs, t,
                RS_ALLOCATION_MIPMAP_NONE, 
                RS_ALLOCATION_USAGE_SCRIPT | RS_ALLOCATION_USAGE_SHARED,
                out);

  ScriptC_x* rsX = new ScriptC_x(rs);
  rsX->set_in1Alloc(in1Alloc);
  rsX->set_in2Alloc(in2Alloc);
  rsX->set_size(size);

  rsX->forEach_compute(in1Alloc, outAlloc);
}

注意:文档中未提及 Allocation :: createTyped()的这种变体,但是代码 rsCppStructs.h 包含了它.这是分配工厂方法,它允许提供后备指针并遵守 USAGE_SHARED 标志.这是它的声明方式:

NOTE: This variation of Allocation::createTyped() is not mentioned in the documentation, but code rsCppStructs.h has it. This is the allocation factory method that allows providing backing pointer and respects USAGE_SHARED flag. This is how it is declared:

/**
 * Creates an Allocation for use by scripts with a given Type and a backing pointer. For use
 * with RS_ALLOCATION_USAGE_SHARED.
 * @param[in] rs Context to which the Allocation will belong
 * @param[in] type Type of the Allocation
 * @param[in] mipmaps desired mipmap behavior for the Allocation
 * @param[in] usage usage for the Allocation
 * @param[in] pointer existing backing store to use for this Allocation if possible
 * @return new Allocation
 */
static sp<Allocation> createTyped(
            const sp<RS>& rs, const sp<const Type>& type,
            RsAllocationMipmapControl mipmaps, 
            uint32_t usage, 
            void * pointer);

这是渲染脚本内核

rs_allocation in1Alloc, in2Alloc;
uint32_t size;

// JUST AN EXAMPLE KERNEL
// Not using reduction kernel since it is only available in later API levels.
// Not sure if support library helps here. Anyways, unrelated to the current problem

float compute(float ignored, uint32_t x) {
  float result = 0.0f;
  for (uint32_t i=0; i<size; i++) {
    result += rsGetElementAt_float(in1Alloc, x) * rsGetElementAt_float(in2Alloc, size-i-1); // just an example computation
  }

  return result;
}

如前所述, out 没有任何计算结果. syncAll(RS_ALLOCATION_USAGE_SHARED) 也没有帮助.

As mentioned, out doesn't have any of the result of the calculation. syncAll(RS_ALLOCATION_USAGE_SHARED) also didn't help.

尽管以下方法有效(但速度较慢)

The following works though (but much slower)

void process(float* in1, float* in2, float* out, size_t size) {
  sp<RS> rs = new RS();
  rs->init(app_cache_dir);

  sp<const Element> e = Element::F32(rs);
  sp<const Type> t = Type::create(rs, e, size, 0, 0);

  sp<Allocation> in1Alloc = Allocation::createTyped(rs, t);
  in1Alloc->copy1DFrom(in1);

  sp<Allocation> in2Alloc = Allocation::createTyped(rs, t);
  in2Alloc->copy1DFrom(in2);

  sp<Allocation> outAlloc = Allocation::createTyped(rs, t);

  ScriptC_x* rsX = new ScriptC_x(rs);
  rsX->set_in1Alloc(in1Alloc);
  rsX->set_in2Alloc(in2Alloc);
  rsX->set_size(size);

  rsX->forEach_compute(in1Alloc, outAlloc);
  outAlloc->copy1DTo(out);
}

通过复制可以使其正常工作,但是在我们的测试中,来回复制会大大降低性能.

Copying makes it to work, but in our testing, copying back and forth significantly degrades performance.

如果通过 debug.rs.default-CPU-driver 系统属性关闭GPU执行,则可以看到自定义驱动程序可以很好地发挥所需的性能.

If we switch off GPU execution through debug.rs.default-CPU-driver system property, we could see that custom driver works well with desired performance.

将渲染脚本的内存对齐为16、32,..或1024等,无助于使自定义驱动程序尊重USAGE_SHARED.

Aligning memory given to renderscript to 16,32,.., or 1024, etc did not help to make the custom driver respect USAGE_SHARED.

因此,我们的问题是:如何使该内核在使用支持GPU执行的自定义渲染脚本驱动程序的设备上正常工作?

So, our question is this: How to make this kernel work for devices that use custom renderscript driver that enables GPU execution?

推荐答案

即使您使用USAGE_SHARED,也需要拥有副本.

You need to have the copy even if you use USAGE_SHARED.

USAGE_SHARED只是驱动程序的提示,它不必使用.

USAGE_SHARED is just a hint to the driver, it doesn’t have to use it.

如果驱动程序确实共享内存,则副本将被忽略并且性能将相同.

If the driver does share the memory the copy will be ignored and performance will be the same.

这篇关于如果USAGE_SHARED,Renderscript在启用GPU的驱动程序上失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆