在aarch64上未对齐的SIMD加载/存储的性能 [英] Performance of unaligned SIMD load/store on aarch64

查看：159 发布时间：2020/8/22 21:29:10 alignment simd neon arm64

本文介绍了在aarch64上未对齐的SIMD加载/存储的性能的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

旧答案表示aarch64支持未对齐的读/写并提到了性能成本，但是目前尚不清楚答案是否仅涵盖ALU或SIMD(128位寄存器)操作.

An older answer indicates that aarch64 supports unaligned reads/writes and has a mention about performance cost, but it's unclear if the answer covers only the ALU or SIMD (128-bit register) operations, too.

相对于对齐的128位NEON加载和存储，未对齐的128位NEON加载和存储在aarch64上速度要慢多少(如果有的话)?

Relative to aligned 128-bit NEON loads and stores, how much slower (if at all) are unaligned 128-bit NEON loads and stores on aarch64?

是否有针对未对齐的SIMD加载和存储的单独说明(如SSE2的情况)，或者已知对齐的加载/存储的指令与潜在未对齐的加载/存储的指令相同?

Are there separate instructions for unaligned SIMD loads and stores (as is the case with SSE2) or are the known-aligned loads/stores the same instructions as potentially-unaligned loads/stores?

推荐答案

根据 Cortex-A57软件优化指南 在 4.6加载/存储对齐部分中说:

ARMv8-A体系结构允许任意类型的多种加载和存储访问. Cortex-A57处理器可处理大多数未对齐的访问，而不会影响性能.但是，在某些情况下减少带宽或产生额外的延迟，如下所述:

The ARMv8-A architecture allows many types of load and store accesses to be arbitrarily aligned. The Cortex-A57 processor handles most unaligned accesses without performance penalties. However, there are cases which reduce bandwidth or incur additional latency, as described below:

加载跨越缓存行(64字节)边界的操作
存储跨越16字节边界的操作

因此，它可能取决于您所使用的处理器，是乱序的(A57，A72，A-72，A-75)还是乱序的(A-35，A-53，A-55).我没有找到有序处理器的任何优化指南，但是它们确实具有硬件性能计数器，可用于检查未对齐指令的数量是否确实影响性能:

So it may depend on the processor that you are using, out of order (A57, A72, A-72, A-75) or in-order (A-35, A-53, A-55). I didn't find any optimization guide for the in-order processors, however they do have a Hardware Performance Counter that you could use to check if the number of unaligned instructions do affect performance:

    0xOF_UNALIGNED_LDST_RETIRED Unaligned load-store

可以与perf工具一起使用.

AArch64中没有针对未对齐访问的特殊说明.

There are no special instructions for unaligned accesses in AArch64.

这篇关于在aarch64上未对齐的SIMD加载/存储的性能的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在aarch64上未对齐的SIMD加载/存储的性能 [英] Performance of unaligned SIMD load/store on aarch64

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在aarch64上未对齐的SIMD加载/存储的性能 [英] Performance of unaligned SIMD load/store on aarch64

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭