向量中的内在 Neon 交换元素 [英] Intrinsics Neon Swap elements in vector

查看:36
本文介绍了向量中的内在 Neon 交换元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用 Neon Intrinsics 优化这些代码.基本上用给定的输入

I would like to optimize such code with Neon Intrinsics. Basically with given input of

0 1 2 3 4 5 6 7 8

0 1 2 3 4 5 6 7 8

将产生输出,

2 1 0 5 4 3 8 7 6

2 1 0 5 4 3 8 7 6

void func(uint8_t* src, uint8_t* dst, int size){

   for (int i = 0; i < size; i++){
     dst[0] = src[2];
     dst[1] = src[1];
     dst[2] = src[0]
     dst = dst+3;
     src = src+3;
   }           
}

我能想到的唯一方法是使用

The only way I can think of is to use

uint8x8x3_t src = vld3_u8(src);

获取 3 个向量,然后访问 src[2]、src[1]、src[0] 中的每个元素并写入内存.

to get 3 vectors and then access every single element from src[2], src[1], src[0] and write to the memory.

有人可以帮忙吗?

谢谢.

推荐答案

这在底层指令集中非常简单,因为您要交换 3 元素结构的两个元素,这实际上已经说明了相关指令:

This is dead easy in the underlying instruction set, because you're swapping two elements of a 3-element structure, which practically spells out the relevant instructions already:

vld3.u8 {d0-d2}, [r0]
vswp d0, d2
vst3.u8 {d0-d2}, [r0]

NEON 程序员指南中甚至还有这个确切的示例,因为它是 RGB-BGR 转换,而这正是 NEON 设计用于处理的类型.

There's even this exact example in the NEON Programmers Guide, because it's a RGB-BGR conversion, and that's exactly the kind of processing NEON was designed for.

使用内在函数有点棘手,因为 vswp 没有内在函数;你只需要用 C 语言表达它并相信编译器会做正确的事情:

With intrinsics it's a bit trickier, as there's no intrinsic for vswp; you just have to express it in C and trust the compiler to do the right thing:

uint8x8x3_t data = vld3_u8(src);
uint8x8_t tmp = data.val[0];
data.val[0] = data.val[2];
data.val[2] = tmp;
vst3_u8(dest, data);

也就是说,由于编译器是 GCC 的各种版本,我无法说服他们中的任何一个实际发出 vswp - 代码生成范围从次优到白痴.Clang 做得好多了,但仍然没有vswp;其他编译器可能更聪明.

That said, with the compilers to hand being various versions of GCC, I failed to convince any of them to actually emit a vswp - code generation ranged from suboptimal to idiotic. Clang did a lot better, but still no vswp; other compilers may be cleverer.

这篇关于向量中的内在 Neon 交换元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆