'vector long long' 的可用性是什么? [英] What is the availability of 'vector long long'?

查看:38
本文介绍了'vector long long' 的可用性是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在一台旧的 PowerMac G5(一台 Power4 机器)上进行测试.构建失败:

I'm testing on an old PowerMac G5, which is a Power4 machine. The build is failing:

$ make
...
g++ -DNDEBUG -g2 -O3 -mcpu=power4 -maltivec -c ppc-simd.cpp
ppc-crypto.h:36: error: use of 'long long' in AltiVec types is invalid
make: *** [ppc-simd.o] Error 1

失败的原因是:

typedef __vector unsigned long long uint64x2_p8;

我无法确定何时应该使 typedef 可用.使用 -mcpu=power4 -maltivec 机器报告 64 位可用性:

I'm having trouble determining when I should make the typedef available. With -mcpu=power4 -maltivec the machine reports 64-bit availability:

$ gcc -mcpu=power4 -maltivec -dM -E - </dev/null | sort | egrep -i -E 'power|ARCH'
#define _ARCH_PPC 1
#define _ARCH_PPC64 1
#define __POWERPC__ 1

OpenPOWER |6.1.Vector Data Types 手册对向量数据类型有很好的介绍,但没有讨论 vector long long 何时可用.

The OpenPOWER | 6.1. Vector Data Types manual has a good information on vector data types, but it does not discuss when the vector long long are available.

__vector unsigned long long 的可用性是什么?我什么时候可以使用 typedef?

What is the availability of __vector unsigned long long? When can I use the typedef?

推荐答案

TL:DR:看起来 POWER7 是 AltiVec 对 64 位元素大小的最低要求.这是VSX(矢量标量扩展)的一部分,维基百科首先确认出现在POWER7中.

TL:DR: it looks like POWER7 is the minimum requirement for 64-bit element size with AltiVec. This is part of VSX (Vector Scalar Extension), which Wikipedia confirms first appeared in POWER7.

gcc 很可能知道它在做什么,并以最低的 -mcpu= 要求启用 64 位元素大小的向量内在函数.

It's very likely that gcc knows what it's doing, and enables 64-bit element-size vector intrinsics with the lowest necessary -mcpu= requirement.

#include <altivec.h>

auto vec32(void) {       // compiles with your options: Power4
    return vec_splats((int) 1);
}

// gcc error: use of 'long long' in AltiVec types is invalid without -mvsx
vector long long vec64(void) {
    return vec_splats((long long) 1);
}

(使用 auto 而不是 vector long long,第二个函数编译为返回两个 64 位整数寄存器.)

(With auto instead of vector long long, the 2nd function compiles to returning in two 64-bit integer registers.)

添加 -mvsx 可以编译第二个函数.使用 -mcpu=power7 也可以,但 power6 不行.

Adding -mvsx lets the 2nd function compile. Using -mcpu=power7 also works, but power6 doesn't.

源上Godbolt + ASM(PowerPC64 gcc6.3)

# with auto without VSX:
vec64():     # -O3 -mcpu=power4 -maltivec -mregnames
    li %r4,1
    li %r3,1
    blr

vec64():  # -O3 -mcpu=power7 -maltivec -mregnames
.LCF2:
0:  addis 2,12,.TOC.-.LCF2@ha
    addi 2,2,.TOC.-.LCF2@l
    addis %r9,%r2,.LC0@toc@ha
    addi %r9,%r9,.LC0@toc@l       # PC-relative addressing for static constant, I think.
    lxvd2x %vs34,0,%r9            # vector load?
    xxpermdi %vs34,%vs34,%vs34,2
    blr


 .LC0:    # in .rodata
    .quad   1
    .quad   1

<小时>

顺便说一句,vec_splats(splat 标量)用一个常量编译成一条指令.但是使用运行时变量(例如函数 arg),它会编译为整数存储/矢量加载/矢量 splat(如 vec_splat 内在函数).显然没有一个 int->vec 指令.


And BTW, vec_splats (splat scalar) with a constant compiles to a single instruction. But with a runtime variable (e.g. a function arg), it compiles to an integer store / vector load / vector-splat (like the vec_splat intrinsic). Apparently there isn't a single instruction for int->vec.

vec_splat_s32 和相关的内在函数接受一个小的(5 位)常量,所以它们只在编译器可以使用相应的 splat-immediate 指令的情况下进行编译.

The vec_splat_s32 and related intrinsics only accept a small (5-bit) constant, so they only compile in cases where the compiler can use the corresponding splat-immediate instruction.

这个 Intel SSE 到 PowerPC AltiVec 的迁移 看起来基本不错,但弄错了(它声称 vec_splats splats 一个带符号的字节).

This Intel SSE to PowerPC AltiVec migration looks mostly good, but got that wrong (it claims that vec_splats splats a signed byte).

这篇关于'vector long long' 的可用性是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆