如何使用 GNU GAS 或 LLVM 组装 ARM SVE 指令并在 QEMU 上运行? [英] How to assemble ARM SVE instructions with GNU GAS or LLVM and run it on QEMU?

查看:55
本文介绍了如何使用 GNU GAS 或 LLVM 组装 ARM SVE 指令并在 QEMU 上运行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想玩新的 ARM SVE 说明 使用开源工具.

首先,我想组装一个最小的例子:https://developer.arm.com/docs/dui0965/latest/getting-started-with-the-sve-compiler/assemble-sve-code

//example1.s.global 主主要的:移动 x0, 0x90000000mov x8, xzrptrue p0.s//SVE 指令fcpy z0.s, p0/m, #5.00000000//SVE 指令orr w10, wzr, #0x400环形:st1w z0.s, p0, [x0, x8, lsl #2]//SVE 指令incw x8//SVE 指令whilelt p0.s, x8, x10//SVE指令b.any loop//SVE指令mov w0, wzr退

但是,当我在 Ubuntu 16.04 上尝试时:

sudo apt-get install binutils-aarch64-linux-gnuaarch64-linux-gnu-as example1.S

它不识别任何 SVE 汇编指令,例如:

example1.S:6: 错误:未知助记符‘ptrue’——‘ptrue p0.s’

我认为这是因为我的 GNU AS 2.26.1 太旧了,还不支持 SVE.

我也可以使用 LLVM 或任何其他开源汇编程序.

一旦我设法组装,我想在 QEMU 用户模式下运行它,因为 3.0.0 有 SVE 支持.

解决方案

带有断言的自动化示例

下面我描述了这个例子是如何实现的.

组装

Ubuntu 18.04 中的 aarch64-linux-gnu-as 2.30 对于 SVE 来说已经足够新了,可以从以下位置看出:https://sourceware.org/binutils/docs-2.30/as/AArch64-Extensions.html#AArch64-Extensions

否则,在 Ubuntu 16.04 上从源代码编译 Binutils 很容易,只需:

git clone git://sourceware.org/git/binutils-gdb.gitcd binutils-gdb# 我测试过的大师.git 结帐 4de5434b694fc260d02610e8e7fec21b2923600a./configure --target aarch64-elf --prefix "$(pwd)/ble"make -j `nproc`进行安装

我没有签出标签,因为最后一个标签是几个月前的,而且我不想在引入 SVE 时查找日志消息 ;-)

然后在Ubuntu 16.04上使用编译好的as并与打包的GCC链接:

./binutils-gdb/ble/bin/aarch64-elf-as -c -march=armv8.5-a+sve \-o example1.o example1.Saarch64-linux-gnu-gcc -march=armv8.5-a -nostdlib -o example1 example1.o

在 Ubuntu 16.04 上,aarch64-linux-gnu-gcc 5.4 没有 -march=armv8.5-a,所以只需使用 -march=armv8-a 应该没问题.无论如何,Ubuntu 16.04 和 18.04 都没有 -march=armv8-a+sve 这将是它到来时的最佳选择.

或者,您也可以将以下内容添加到 .S 源代码的开头,而不是传递 -march=armv8.5-a+sve:

.arch armv8.5-a+sve

在 Ubuntu 19.04 Binutils 2.32 上,我也了解并测试过:

aarch64-linux-gnu-as -march=all

这也适用于 SVE,我想我将来会使用更多,因为它似乎只是一次性启用所有功能,而不仅仅是 SVE!

QEMU 模拟

在 QEMU 上逐步调试它的过程解释如下:如何在 QEMU 上的 GDB 中单步执行 ARM 汇编?

首先,我将示例制作成一个最小的自包含 Linux 可执行文件:

.datax: .double 1.5, 2.5, 3.5, 4.5y: .double 5.0, 6.0, 7.0, 8.0y_expect: .double 8.0, 11.0, 14.0, 17.0a: .double 2.0n: .word 4.文本.global _start_开始:ldr x0, = xldr x1, =yldr x2, =aldr x3, =nbl daxpy/* 出口 */移动 x0, #0mov x8,#93svc #0/* 乘以一个标量并相加.** 手术:** Y += a * X** C 签名:** void daxpy(double *x, double *y, double *a, int *n)** daxpy"这个名字来自 LAPACK:* http://www.netlib.org/lapack/explore-html/de/da4/group__double__blas__level1_ga8f99d6a644d3396aa32db472e0cfc91c.html** 改编自:https://alastairreid.github.io/papers/sve-ieee-micro-2017.pdf*/戴克斯:ldrsw x3, [x3]移动 x4,#0而lt p0.d, x4, x3ld1rd z0.d, p0/z, [x2].环形:ld1d z1.d, p0/z, [x0, x4, lsl #3]ld1d z2.d, p0/z, [x1, x4, lsl #3]fmla z2.d, p0/m, z1.d, z0.dst1d z2.d, p0, [x1, x4, lsl #3]光盘 x4而lt p0.d, x4, x3b.第一个.loop退

你可以运行它:

qemu-aarch64 -L/usr/aarch64-linux-gnu -E LD_BIND_NOW=1 ./example1

然后它很好地退出.

接下来,我们可以逐步调试以确认总和确实进行了:

qemu-aarch64 -g 1234 -L/usr/aarch64-linux-gnu -E LD_BIND_NOW=1 ./example1

和:

./binutils-gdb/ble/bin/aarch64-elf-gdb -ex 'file example1' \-ex '目标远程本地主机:1234' -ex '设置 sysroot/usr/aarch64-linux-gnu'

现在,在 bl daxpy 之后向右走,然后运行:

<预><代码>>>>p (double[4])y_expect$1 = {[0] = 8, [1] = 11, [2] = 14, [3] = 17}>>>p (double[4])y$2 = {[0] = 8, [1] = 11, [2] = 14, [3] = 17}

这确认总和实际上按预期完成.

观察 SVE 寄存器似乎未实现,因为我在以下位置找不到任何内容:https://github.com/qemu/qemu/tree/v3.0.0/gdb-xml 但是通过复制其他FP寄存器来实现应该不会太难吧?问:http://lists.nongnu.org/archive/html/qemu-discuss/2018-10/msg00020.html

您目前已经可以通过以下方式部分和间接地观察它:

ir d0 d1 d2

因为 SVE 寄存器 zX 的第一个条目与旧的 vX FP 寄存器共享,但是我们根本看不到 p.

I want to play with the new ARM SVE instructions using open source tools.

As a start, I would like to assemble the minimal example present at: https://developer.arm.com/docs/dui0965/latest/getting-started-with-the-sve-compiler/assembling-sve-code

// example1.s
    .global main
main:
    mov     x0, 0x90000000
    mov     x8, xzr
    ptrue   p0.s                        //SVE instruction
    fcpy    z0.s, p0/m, #5.00000000     //SVE instruction
    orr     w10, wzr, #0x400
loop:
    st1w    z0.s, p0, [x0, x8, lsl #2]  //SVE instruction
    incw    x8                          //SVE instruction
    whilelt p0.s, x8, x10               //SVE instruction
    b.any   loop                        //SVE instruction
    mov     w0, wzr
    ret

However, when I try that on my Ubuntu 16.04:

sudo apt-get install binutils-aarch64-linux-gnu
aarch64-linux-gnu-as example1.S

it does not recognize any of the SVE assembly instructions, e.g.:

example1.S:6: Error: unknown mnemonic `ptrue' -- `ptrue p0.s'

I think this is because my GNU AS 2.26.1 is too old and does not have SVE support yet.

I'm also fine using LLVM or any other open source assembler.

Once I manage to assemble, I then want to run it on QEMU user mode since 3.0.0 has SVE support.

解决方案

Automated example with an assertion

Below I described how that example was achieved.

Assembly

The aarch64-linux-gnu-as 2.30 in Ubuntu 18.04 is already new enough for SVE as can be seen from: https://sourceware.org/binutils/docs-2.30/as/AArch64-Extensions.html#AArch64-Extensions

Otherwise, compiling Binutils from source is easy on Ubuntu 16.04, just do:

git clone git://sourceware.org/git/binutils-gdb.git
cd binutils-gdb
# master that I tested with.
git checkout 4de5434b694fc260d02610e8e7fec21b2923600a
./configure --target aarch64-elf --prefix "$(pwd)/ble"
make -j `nproc`
make install

I didn't check out to a tag because the last tag is a few months old, and I don't feel like grepping log messages for when SVE was introduced ;-)

Then use the compiled as and link with the packaged GCC on Ubuntu 16.04:

./binutils-gdb/ble/bin/aarch64-elf-as -c -march=armv8.5-a+sve \
    -o example1.o example1.S
aarch64-linux-gnu-gcc -march=armv8.5-a -nostdlib -o example1 example1.o

On Ubuntu 16.04, aarch64-linux-gnu-gcc 5.4 does not have -march=armv8.5-a, so just use -march=armv8-a and it should be fine. In any case, neither Ubuntu 16.04 nor 18.04 has -march=armv8-a+sve which will be the best option when it arrives.

Alternatively, instead of passing -march=armv8.5-a+sve, you can also add the following to the start of the .S source code:

.arch armv8.5-a+sve

On Ubuntu 19.04 Binutils 2.32, I also learnt about and tested:

aarch64-linux-gnu-as -march=all

which also works for SVE, I think I'll be using more of that in the future, as it seems to just enable all features in one go, not just SVE!

QEMU simulation

The procedure to step debug it on QEMU is explained at: How to single step ARM assembly in GDB on QEMU?

First I made the example into a minimal self contained Linux executable:

.data
    x: .double        1.5,  2.5,  3.5,  4.5
    y: .double        5.0,  6.0,  7.0,  8.0
    y_expect: .double 8.0, 11.0, 14.0, 17.0
    a: .double        2.0
    n: .word          4

.text
.global _start
_start:
    ldr x0, =x
    ldr x1, =y
    ldr x2, =a
    ldr x3, =n
    bl daxpy

    /* exit */
    mov x0, #0
    mov x8, #93
    svc #0


/* Multiply by a scalar and add.
 *
 * Operation:
 *
 *      Y += a * X
 *
 * C signature:
 *
 *      void daxpy(double *x, double *y, double *a, int *n)
 *
 * The name "daxpy" comes from LAPACK:
 * http://www.netlib.org/lapack/explore-html/de/da4/group__double__blas__level1_ga8f99d6a644d3396aa32db472e0cfc91c.html
 *
 * Adapted from: https://alastairreid.github.io/papers/sve-ieee-micro-2017.pdf
 */
daxpy:
    ldrsw x3, [x3]
    mov x4, #0
    whilelt p0.d, x4, x3
    ld1rd z0.d, p0/z, [x2]
.loop:
    ld1d z1.d, p0/z, [x0, x4, lsl #3]
    ld1d z2.d, p0/z, [x1, x4, lsl #3]
    fmla z2.d, p0/m, z1.d, z0.d
    st1d z2.d, p0, [x1, x4, lsl #3]
    incd x4
    whilelt p0.d, x4, x3
    b.first .loop
    ret

You can run it with:

qemu-aarch64 -L /usr/aarch64-linux-gnu -E LD_BIND_NOW=1 ./example1

then it exits nicely.

Next, we can step debug to confirm that the sum was actually made:

qemu-aarch64 -g 1234 -L /usr/aarch64-linux-gnu -E LD_BIND_NOW=1 ./example1

and:

./binutils-gdb/ble/bin/aarch64-elf-gdb -ex 'file example1' \
  -ex 'target remote localhost:1234' -ex 'set sysroot /usr/aarch64-linux-gnu'

Now, step up to right after bl daxpy, and run:

>>> p (double[4])y_expect
$1 = {[0] = 8, [1] = 11, [2] = 14, [3] = 17}
>>> p (double[4])y
$2 = {[0] = 8, [1] = 11, [2] = 14, [3] = 17}

which confirms that the sum was actually done as expected.

Observing SVE registers seems unimplemented as I can't find anything under: https://github.com/qemu/qemu/tree/v3.0.0/gdb-xml but it should not be too hard to implement by copying other FP registers? Asked at: http://lists.nongnu.org/archive/html/qemu-discuss/2018-10/msg00020.html

You can currently already observe it partially and indirectly by doing:

i r d0 d1 d2

because the first entry of SVE register zX is shared with the older vX FP registers, but we can't see p at all.

这篇关于如何使用 GNU GAS 或 LLVM 组装 ARM SVE 指令并在 QEMU 上运行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆