VLD1 中的对齐 [英] Alignment in VLD1
问题描述
我有一个关于 ARM Neon VLD1 指令对齐的问题.以下代码中的对齐方式是如何工作的?
DATA .req r0vld1.16 {d16, d17, d18, d19},[数据,:128]!
这条读指令的起始地址是否移位到DATA+一个正整数,使得它是不小于DATA的16的最小倍数(16字节=128位),或者DATA本身变为最小倍数16 不少于DATA?
这是对 CPU 的提示.我从 博客文章 在 ARM 的网站上声称它使加载速度更快,但是它没有说明如何或为什么.可能是因为 CPU 可以发出更大的负载.
<块引用><块引用>您还可以使用可选的 : 参数为 Rn 中传递的指针指定对齐方式,这通常会加快内存访问速度.
如果您提供提示,您必须确保 DATA
对齐到 16 个字节,否则您将收到硬件异常.
此硬件行为在 ARM ARM as
如果 ConditionPassed() 那么EncodingSpecificOperations();CheckAdvSIMDEnabled();NullCheckIfThumbEE(n);地址 = R[n];如果(地址 MOD 对齐)!= 0 然后 GenerateAlignmentException();if wback then R[n] = R[n] + (if register_index then R[m] else ebytes);Elem[D[d],index,esize] = MemU[address,ebytes];
主要是这一行
if (address MOD alignment) != 0 then GenerateAlignmentException();
我实际上无法理解为什么 CPU 可以自行检查对齐并应用最佳条件.可能会花费太多的周期.
I have a question about ARM Neon VLD1 instruction's alignment. How does the alignment in the following code work?
DATA .req r0
vld1.16 {d16, d17, d18, d19}, [DATA, :128]!
Does the starting address of this read instruction shifts to DATA + a positive integer, such that it is the smallest multiple of 16(16 bytes = 128 bits) which is no less than DATA, or DATA itself changes to the smallest multiple of 16 no less than DATA?
It is a hint to the CPU. Only thing I read about the usefulness of such hint was from a blog post on ARM's site claiming it makes the loading faster, it doesn't say how or why however. Probably because CPU can issue wider loads.
You can also specify an alignment for the pointer passed in Rn, using the optional : parameter, which often speeds up memory accesses.
If you provide the hint you must make sure that DATA
is aligned to 16 bytes otherwise you'll get an hardware exception.
This hardware behavior is described in VLD1 description in ARM ARM as
if ConditionPassed() then
EncodingSpecificOperations(); CheckAdvSIMDEnabled(); NullCheckIfThumbEE(n);
address = R[n]; if (address MOD alignment) != 0 then GenerateAlignmentException();
if wback then R[n] = R[n] + (if register_index then R[m] else ebytes);
Elem[D[d],index,esize] = MemU[address,ebytes];
mainly this line
if (address MOD alignment) != 0 then GenerateAlignmentException();
I actually can't understand why CPU can check alignment itself and apply the best condition. May be that would cost too much cycles.
这篇关于VLD1 中的对齐的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!