'vector long long' 的可用性是什么? [英] What is the availability of 'vector long long'?
问题描述
我正在一台旧的 PowerMac G5(一台 Power4 机器)上进行测试.构建失败:
I'm testing on an old PowerMac G5, which is a Power4 machine. The build is failing:
$ make
...
g++ -DNDEBUG -g2 -O3 -mcpu=power4 -maltivec -c ppc-simd.cpp
ppc-crypto.h:36: error: use of 'long long' in AltiVec types is invalid
make: *** [ppc-simd.o] Error 1
失败的原因是:
typedef __vector unsigned long long uint64x2_p8;
我无法确定何时应该使 typedef 可用.使用 -mcpu=power4 -maltivec
机器报告 64 位可用性:
I'm having trouble determining when I should make the typedef available. With -mcpu=power4 -maltivec
the machine reports 64-bit availability:
$ gcc -mcpu=power4 -maltivec -dM -E - </dev/null | sort | egrep -i -E 'power|ARCH'
#define _ARCH_PPC 1
#define _ARCH_PPC64 1
#define __POWERPC__ 1
OpenPOWER |6.1.Vector Data Types 手册对向量数据类型有很好的介绍,但没有讨论 vector long long
何时可用.
The OpenPOWER | 6.1. Vector Data Types manual has a good information on vector data types, but it does not discuss when the vector long long
are available.
__vector unsigned long long
的可用性是什么?我什么时候可以使用 typedef?
What is the availability of __vector unsigned long long
? When can I use the typedef?
推荐答案
TL:DR:看起来 POWER7 是 AltiVec 对 64 位元素大小的最低要求.这是VSX(矢量标量扩展)的一部分,维基百科首先确认出现在POWER7中.
TL:DR: it looks like POWER7 is the minimum requirement for 64-bit element size with AltiVec. This is part of VSX (Vector Scalar Extension), which Wikipedia confirms first appeared in POWER7.
gcc 很可能知道它在做什么,并以最低的 -mcpu=
要求启用 64 位元素大小的向量内在函数.
It's very likely that gcc knows what it's doing, and enables 64-bit element-size vector intrinsics with the lowest necessary -mcpu=
requirement.
#include <altivec.h>
auto vec32(void) { // compiles with your options: Power4
return vec_splats((int) 1);
}
// gcc error: use of 'long long' in AltiVec types is invalid without -mvsx
vector long long vec64(void) {
return vec_splats((long long) 1);
}
(使用 auto
而不是 vector long long
,第二个函数编译为返回两个 64 位整数寄存器.)
(With auto
instead of vector long long
, the 2nd function compiles to returning in two 64-bit integer registers.)
添加 -mvsx
可以编译第二个函数.使用 -mcpu=power7
也可以,但 power6 不行.
Adding -mvsx
lets the 2nd function compile. Using -mcpu=power7
also works, but power6 doesn't.
源上Godbolt + ASM(PowerPC64 gcc6.3)
# with auto without VSX:
vec64(): # -O3 -mcpu=power4 -maltivec -mregnames
li %r4,1
li %r3,1
blr
vec64(): # -O3 -mcpu=power7 -maltivec -mregnames
.LCF2:
0: addis 2,12,.TOC.-.LCF2@ha
addi 2,2,.TOC.-.LCF2@l
addis %r9,%r2,.LC0@toc@ha
addi %r9,%r9,.LC0@toc@l # PC-relative addressing for static constant, I think.
lxvd2x %vs34,0,%r9 # vector load?
xxpermdi %vs34,%vs34,%vs34,2
blr
.LC0: # in .rodata
.quad 1
.quad 1
<小时>
顺便说一句,vec_splats
(splat 标量)用一个常量编译成一条指令.但是使用运行时变量(例如函数 arg),它会编译为整数存储/矢量加载/矢量 splat(如 vec_splat
内在函数).显然没有一个 int->vec 指令.
And BTW, vec_splats
(splat scalar) with a constant compiles to a single instruction. But with a runtime variable (e.g. a function arg), it compiles to an integer store / vector load / vector-splat (like the vec_splat
intrinsic). Apparently there isn't a single instruction for int->vec.
vec_splat_s32
和相关的内在函数只接受一个小的(5 位)常量,所以它们只在编译器可以使用相应的 splat-immediate 指令的情况下进行编译.
The vec_splat_s32
and related intrinsics only accept a small (5-bit) constant, so they only compile in cases where the compiler can use the corresponding splat-immediate instruction.
这个 Intel SSE 到 PowerPC AltiVec 的迁移 看起来基本不错,但弄错了(它声称 vec_splats
splats 一个带符号的字节).
This Intel SSE to PowerPC AltiVec migration looks mostly good, but got that wrong (it claims that vec_splats
splats a signed byte).
这篇关于'vector long long' 的可用性是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!