错误:在操作数 1 处无效使用向量寄存器 [英] Error: invalid use of vector register at operand 1
问题描述
我正在 64 位 Aarch64 设备上的 ARM 下学习 GCC 内联汇编器.我看到一条我不太明白的错误消息.GCC 内联汇编器中的错误消息:
$ gcc -DNDEBUG -g3 -O1 -march=armv8-a+crc+crypto test.cc -o test.exe/tmp/ccCHOWrn.s:汇编程序消息:/tmp/ccCHOWrn.s:19: 错误:在操作数 1 处无效使用向量寄存器 -- `pmull v0,v0,v0'
示例程序只是尝试练习多项式乘法:
$ cat test.cc#include int main(int argc, char* argv[]){uint64x2_t r = {0,0}, a = {2,4};__asm__ __volatile__("pmull %0, %1, %1;": "=w" (r): "w" (a), "w" (a):抄送");返回 (int)r[0];}
"w"
是一个 Aarch64 机器约束.在这种情况下,它被描述为浮点或 SIMD 向量寄存器",这似乎是我想要的.
uint64x2_t
类型通常与 ARM 内在函数一起使用.但它是一个 128 位类型并为 SIMD 协处理器对齐,因此它似乎是示例的不错选择.
该设备是一个 LeMaker HiKey一个 Linaro 图像和 GCC 4.9.2 编译器.看起来这个错误是几年前修复的,但我不确定它是否相关:arm64 上的 fbb ftbfs.
我有两个问题:
- 该错误是什么意思,我该如何解决?
pmull
和pmull2
是否有内在函数?
我尝试添加排列说明符,但我并不惊讶它不起作用,因为我不知道语法:
$ gcc -DNDEBUG -g3 -O1 -march=armv8-a+crc+crypto test.cc -o test.exetest.cc: 在函数‘int main(int, char**)’中:test.cc:8:15: 错误:数字常量前的预期‘)’:=w"(r.1q)^test.cc:8:15: 错误:数字常量前的预期‘)’test.cc:9:6: 错误: 预期的‘;’在‘:’标记之前: "w" (a.1d), "w" (a.1d)^test.cc:9:6: 错误:':' 标记前的预期主表达式
<小时>
我还尝试添加双百分号(即 %%0
和 %%1
),因为汇编器在使用 .att_stntax
和 .intel_syntax
:
$ gcc -DNDEBUG -g3 -O1 -march=armv8-a+crc+crypto test.cc -o test.exe/tmp/ccPpnvUP.s:汇编程序消息:/tmp/ccPpnvUP.s:19: 错误:操作数 1 应该是 SIMD 向量寄存器——`pmull %0,%1,%1'
该错误是什么意思,我该如何解决?
我猜这个错误意味着发生了违反约束的情况.看起来以下内容可以解决问题:
$ cat test.cc#include #include int main(int argc, char* argv[]){uint64x2_t r = {0,0}, a = {2,4};__asm__ __volatile__("pmull %0.1q, %1.1d, %1.1d;": "=w" (r): "w" (a[0]), "w" (a[1]):抄送");fprintf(stdout, "%d, %d\n", r[0], r[1]);返回0;}
还有:
$ gcc -march=armv8-a+crc+crypto test.cc -o test.exe$ ./test.exe4, 0
和:
$ gdb -batch -ex 'disassemble main' ./test.exe转储函数 main 的汇编代码:0x00000000004005f0 <+0>: stp x29, x30, [sp,#-64]!0x00000000004005f4 <+4>: mov x29, sp0x00000000004005f8 <+8>: str w0, [x29,#28]0x00000000004005fc <+12>: str x1, [x29,#16]0x0000000000400600 <+16>:movi v0.4s,#0x00x0000000000400604 <+20>: mov x0, v0.d[0]0x0000000000400608 <+24>: mov x1, v0.d[1]0x000000000040060c <+28>: fmov d0, x00x0000000000400610 <+32>: mov v0.d[1], x10x0000000000400614 <+36>: str q0, [x29,#48]0x0000000000400618 <+40>: ldr q0, 0x4006a00x000000000040061c <+44>: mov x0, v0.d[0]0x0000000000400620 <+48>: mov x1, v0.d[1]0x0000000000400624 <+52>: fmov d0, x00x0000000000400628 <+56>: mov v0.d[1], x10x000000000040062c <+60>: str q0, [x29,#32]0x0000000000400630 <+64>: ldr x0, [x29,#32]0x0000000000400634 <+68>: ldr x1, [x29,#40]0x0000000000400638 <+72>: fmov d0, x00x000000000040063c <+76>: fmov d1, x10x0000000000400640 <+80>: pmull v0.1q, v0.1d, v0.1d0x0000000000400644 <+84>: mov x0, v0.d[0]0x0000000000400648 <+88>: mov x1, v0.d[1]0x000000000040064c <+92>: fmov d0, x00x0000000000400650 <+96>: mov v0.d[1], x10x0000000000400654 <+100>: str q0, [x29,#48]0x0000000000400658 <+104>: adrp x0, 0x4100000x000000000040065c <+108>:添加 x0、x0、#0x9f00x0000000000400660 <+112>: ldr x4, [x0]0x0000000000400664 <+116>: ldr x1, [x29,#48]0x0000000000400668 <+120>: ldr x2, [x29,#56]0x000000000040066c <+124>: adrp x0, 0x4000000x0000000000400670 <+128>: 添加 x0, x0, #0x7480x0000000000400674 <+132>: mov x3, x20x0000000000400678 <+136>: mov x2, x10x000000000040067c <+140>: mov x1, x00x0000000000400680 <+144>: mov x0, x40x0000000000400684 <+148>:bl 0x4004a0 <fprintf@plt>0x0000000000400688 <+152>: mov w0, #0x0//#00x000000000040068c <+156>: ldp x29, x30, [sp],#640x0000000000400690 <+160>: ret汇编程序转储结束.
<小时><块引用>
pmull 和 pmull2 是否有内在函数?
看起来有一些内在函数:
$ gcc -march=armv8-a+crc+crypto -E test.cc |grep -B 4 pmull__extension__ 静态 __inline poly16x8_t __attribute__ ((__always_inline__))vmull_high_p8 (poly8x16_t a, poly8x16_t b){poly16x8_t 结果;__asm__ ("pmull2 %0.8h,%1.16b,%2.16b"——__extension__ 静态 __inline poly16x8_t __attribute__ ((__always_inline__))vmull_p8 (poly8x8_t a, poly8x8_t b){poly16x8_t 结果;__asm__ ("pmull %0.8h, %1.8b, %2.8b"——静态 __inline poly128_tvmull_p64 (poly64_t a, poly64_t b){返回__builtin_aarch64_crypto_pmulldi_ppp (a, b);——静态 __inline poly128_tvmull_high_p64 (poly64x2_t a, poly64x2_t b){返回 __builtin_aarch64_crypto_pmullv2di_ppp (a, b);
I'm learning GCC inline assembler under under ARM on a 64-bit Aarch64 device. I'm seeing an error message I don't quite understand. The error message in from GCC's inline assembler:
$ gcc -DNDEBUG -g3 -O1 -march=armv8-a+crc+crypto test.cc -o test.exe
/tmp/ccCHOWrn.s: Assembler messages:
/tmp/ccCHOWrn.s:19: Error: invalid use of vector register at operand 1 -- `pmull v0,v0,v0'
The sample program simply tries to exercise the polynomial multiply:
$ cat test.cc
#include <arm_neon.h>
int main(int argc, char* argv[])
{
uint64x2_t r = {0,0}, a = {2,4};
__asm__ __volatile__
(
"pmull %0, %1, %1;"
: "=w" (r)
: "w" (a), "w" (a)
: "cc"
);
return (int)r[0];
}
The "w"
is an Aarch64 machine constraint. In this case, its described as "Floating point or SIMD vector register", which seems to be what I want.
The uint64x2_t
type is typically used with ARM intrinsics. But its a 128-bit type and aligned for SIMD coprocessor, so it seemed like a good choice for the sample.
The device is a LeMaker HiKey with a Linaro image and GCC 4.9.2 compiler. It looks like this bug was fixed a couple of years ago, but I'm not sure if its related: fbb ftbfs on arm64.
I have two questions:
- What does the error mean, and how can I fix it?
- Is there an intrinsic for
pmull
andpmull2
?
I tried adding the arrangement specifiers, but I'm not surprised it did not work since I don't know the syntax:
$ gcc -DNDEBUG -g3 -O1 -march=armv8-a+crc+crypto test.cc -o test.exe
test.cc: In function ‘int main(int, char**)’:
test.cc:8:15: error: expected ‘)’ before numeric constant
: "=w" (r.1q)
^
test.cc:8:15: error: expected ‘)’ before numeric constant
test.cc:9:6: error: expected ‘;’ before ‘:’ token
: "w" (a.1d), "w" (a.1d)
^
test.cc:9:6: error: expected primary-expression before ‘:’ token
I also tried adding double percent signs (i.e., %%0
and %%1
) since the assembler was having trouble with .att_stntax
and .intel_syntax
:
$ gcc -DNDEBUG -g3 -O1 -march=armv8-a+crc+crypto test.cc -o test.exe
/tmp/ccPpnvUP.s: Assembler messages:
/tmp/ccPpnvUP.s:19: Error: operand 1 should be a SIMD vector register -- `pmull %0,%1,%1'
What does the error mean, and how can I fix it?
I guess the error means a constraint violation occurred. It looks like the following does the trick:
$ cat test.cc
#include <arm_neon.h>
#include <stdio.h>
int main(int argc, char* argv[])
{
uint64x2_t r = {0,0}, a = {2,4};
__asm__ __volatile__
(
"pmull %0.1q, %1.1d, %1.1d;"
: "=w" (r)
: "w" (a[0]), "w" (a[1])
: "cc"
);
fprintf(stdout, "%d, %d\n", r[0], r[1]);
return 0;
}
And:
$ gcc -march=armv8-a+crc+crypto test.cc -o test.exe
$ ./test.exe
4, 0
And:
$ gdb -batch -ex 'disassemble main' ./test.exe
Dump of assembler code for function main:
0x00000000004005f0 <+0>: stp x29, x30, [sp,#-64]!
0x00000000004005f4 <+4>: mov x29, sp
0x00000000004005f8 <+8>: str w0, [x29,#28]
0x00000000004005fc <+12>: str x1, [x29,#16]
0x0000000000400600 <+16>: movi v0.4s, #0x0
0x0000000000400604 <+20>: mov x0, v0.d[0]
0x0000000000400608 <+24>: mov x1, v0.d[1]
0x000000000040060c <+28>: fmov d0, x0
0x0000000000400610 <+32>: mov v0.d[1], x1
0x0000000000400614 <+36>: str q0, [x29,#48]
0x0000000000400618 <+40>: ldr q0, 0x4006a0
0x000000000040061c <+44>: mov x0, v0.d[0]
0x0000000000400620 <+48>: mov x1, v0.d[1]
0x0000000000400624 <+52>: fmov d0, x0
0x0000000000400628 <+56>: mov v0.d[1], x1
0x000000000040062c <+60>: str q0, [x29,#32]
0x0000000000400630 <+64>: ldr x0, [x29,#32]
0x0000000000400634 <+68>: ldr x1, [x29,#40]
0x0000000000400638 <+72>: fmov d0, x0
0x000000000040063c <+76>: fmov d1, x1
0x0000000000400640 <+80>: pmull v0.1q, v0.1d, v0.1d
0x0000000000400644 <+84>: mov x0, v0.d[0]
0x0000000000400648 <+88>: mov x1, v0.d[1]
0x000000000040064c <+92>: fmov d0, x0
0x0000000000400650 <+96>: mov v0.d[1], x1
0x0000000000400654 <+100>: str q0, [x29,#48]
0x0000000000400658 <+104>: adrp x0, 0x410000
0x000000000040065c <+108>: add x0, x0, #0x9f0
0x0000000000400660 <+112>: ldr x4, [x0]
0x0000000000400664 <+116>: ldr x1, [x29,#48]
0x0000000000400668 <+120>: ldr x2, [x29,#56]
0x000000000040066c <+124>: adrp x0, 0x400000
0x0000000000400670 <+128>: add x0, x0, #0x748
0x0000000000400674 <+132>: mov x3, x2
0x0000000000400678 <+136>: mov x2, x1
0x000000000040067c <+140>: mov x1, x0
0x0000000000400680 <+144>: mov x0, x4
0x0000000000400684 <+148>: bl 0x4004a0 <fprintf@plt>
0x0000000000400688 <+152>: mov w0, #0x0 // #0
0x000000000040068c <+156>: ldp x29, x30, [sp],#64
0x0000000000400690 <+160>: ret
End of assembler dump.
Is there an intrinsic for pmull and pmull2?
It looks like there are some intrinsics:
$ gcc -march=armv8-a+crc+crypto -E test.cc | grep -B 4 pmull
__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
vmull_high_p8 (poly8x16_t a, poly8x16_t b)
{
poly16x8_t result;
__asm__ ("pmull2 %0.8h,%1.16b,%2.16b"
--
__extension__ static __inline poly16x8_t __attribute__ ((__always_inline__))
vmull_p8 (poly8x8_t a, poly8x8_t b)
{
poly16x8_t result;
__asm__ ("pmull %0.8h, %1.8b, %2.8b"
--
static __inline poly128_t
vmull_p64 (poly64_t a, poly64_t b)
{
return
__builtin_aarch64_crypto_pmulldi_ppp (a, b);
--
static __inline poly128_t
vmull_high_p64 (poly64x2_t a, poly64x2_t b)
{
return __builtin_aarch64_crypto_pmullv2di_ppp (a, b);
这篇关于错误:在操作数 1 处无效使用向量寄存器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!