使用AVX2收集指令时的加载地址计算 [英] Load address calculation when using AVX2 gather instructions
问题描述
查看 AVX2 内部函数文档,那里收集了一些加载指令,例如 VPGATHERDD
:
Looking at the AVX2 intrinsics documentation there are gathered load instructions such as VPGATHERDD
:
__m128i _mm_i32gather_epi32 (int const * base, __m128i index, const int scale);
我从文档中不清楚计算出的加载地址是元素地址还是字节地址,即元素<的加载地址代码>i:
What isn't clear to me from the documentation is whether the calculated load address is an element address or a byte address, i.e. is the load address for element i
:
load_addr = base + index[i] * scale; // (1) element addressing ?
或:
load_addr = (char *)base + index[i] * scale; // (2) byte addressing ?
来自 英特尔文档 看起来可能是 (2),但是鉴于收集加载的最小元素大小是 32 位,这没有多大意义 - 为什么要从未对齐的地址加载(即使用 scale <4) ?
From the Intel docs it looks like it might be (2), but this doesn't make much sense given that the smallest element size for gathered loads is 32 bits - why would you want to load from misaligned addresses (i.e. use scale < 4) ?
推荐答案
收集指令没有任何对齐要求.所以不允许字节寻址就太严格了.
Gather instructions do not have any alignment requirements. So it would be too restrictive not to allow byte addressing.
另一个原因是一致性.使用 SIB 寻址,我们显然有 byte 地址:
Other reason is consistency. With SIB addressing we obviously have byte address:
MOV eax, [rcx + rdx * 2]
由于 VPGATHERDD
只是这个 MOV
指令的矢量化变体,我们不应该期望 VSIB 寻址有什么不同:
Since VPGATHERDD
is just a vectorized variant of this MOV
instruction, we should not expect anything different with VSIB addressing:
VPGATHERDD ymm0, [rcx + ymm2 * 2], ymm3
至于现实生活中字节寻址的使用,我们可以有一个 24 位彩色图像,其中每个像素都是 3 字节对齐的.我们可以使用单个 VPGATHERDD 指令加载 8 个像素,但前提是 VSIB 中的scale"字段为1"并且 VPGATHERDD
使用 byte 寻址.
As for real life use for byte addressing, we could have a 24-bit color image where each pixel is 3-byte aligned. We could load 8 pixels with single VPGATHERDD instruction but only if "scale" field in VSIB is "1" and VPGATHERDD
uses byte addressing.
这篇关于使用AVX2收集指令时的加载地址计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!