在aarch64上未对齐的SIMD加载/存储的性能 [英] Performance of unaligned SIMD load/store on aarch64

查看:159
本文介绍了在aarch64上未对齐的SIMD加载/存储的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

旧答案表示aarch64支持未对齐的读/写并提到了性能成本,但是目前尚不清楚答案是否仅涵盖ALU或SIMD(128位寄存器)操作.

An older answer indicates that aarch64 supports unaligned reads/writes and has a mention about performance cost, but it's unclear if the answer covers only the ALU or SIMD (128-bit register) operations, too.

相对于对齐的128位NEON加载和存储,未对齐的128位NEON加载和存储在aarch64上速度要慢多少(如果有的话)?

Relative to aligned 128-bit NEON loads and stores, how much slower (if at all) are unaligned 128-bit NEON loads and stores on aarch64?

是否有针对未对齐的SIMD加载和存储的单独说明(如SSE2的情况),或者已知对齐的加载/存储的指令与潜在未对齐的加载/存储的指令相同?

Are there separate instructions for unaligned SIMD loads and stores (as is the case with SSE2) or are the known-aligned loads/stores the same instructions as potentially-unaligned loads/stores?

推荐答案

根据 Cortex-A57软件优化指南 4.6加载/存储对齐部分中说:

ARMv8-A体系结构允许任意类型的多种加载和存储访问. Cortex-A57处理器可处理大多数未对齐的访问,而不会影响性能.但是,在某些情况下 减少带宽或产生额外的延迟,如下所述:

The ARMv8-A architecture allows many types of load and store accesses to be arbitrarily aligned. The Cortex-A57 processor handles most unaligned accesses without performance penalties. However, there are cases which reduce bandwidth or incur additional latency, as described below:

  • 加载跨越缓存行(64字节)边界的操作
  • 存储跨越16字节边界的操作

因此,它可能取决于您所使用的处理器,是乱序的(A57,A72,A-72,A-75)还是乱序的(A-35,A-53,A-55).我没有找到有序处理器的任何优化指南,但是它们确实具有硬件性能计数器,可用于检查未对齐指令的数量是否确实影响性能:

So it may depend on the processor that you are using, out of order (A57, A72, A-72, A-75) or in-order (A-35, A-53, A-55). I didn't find any optimization guide for the in-order processors, however they do have a Hardware Performance Counter that you could use to check if the number of unaligned instructions do affect performance:

    0xOF_UNALIGNED_LDST_RETIRED Unaligned load-store

可以与perf工具一起使用.

AArch64中没有针对未对齐访问的特殊说明.

There are no special instructions for unaligned accesses in AArch64.

这篇关于在aarch64上未对齐的SIMD加载/存储的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆