VLD1 中的对齐 [英] Alignment in VLD1

查看:20
本文介绍了VLD1 中的对齐的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个关于 ARM Neon VLD1 指令对齐的问题.以下代码中的对齐方式是如何工作的?

DATA .req r0vld1.16 {d16, d17, d18, d19},[数据,:128]!

这条读指令的起始地址是否移位到DATA+一个正整数,使得它是不小于DATA的16的最小倍数(16字节=128位),或者DATA本身变为最小倍数16 不少于DATA?

解决方案

这是对 CPU 的提示.我从 博客文章 在 ARM 的网站上声称它使加载速度更快,但是它没有说明如何或为什么.可能是因为 CPU 可以发出更大的负载.

<块引用><块引用>

您还可以使用可选的 : 参数为 Rn 中传递的指针指定对齐方式,这通常会加快内存访问速度.

如果您提供提示,您必须确保 DATA 对齐到 16 个字节,否则您将收到硬件异常.

此硬件行为在 ARM ARM as

如果 ConditionPassed() 那么EncodingSpecificOperations();CheckAdvSIMDEnabled();NullCheckIfThumbEE(n);地址 = R[n];如果(地址 MOD 对齐)!= 0 然后 GenerateAlignmentException();if wback then R[n] = R[n] + (if register_index then R[m] else ebytes);Elem[D[d],index,esize] = MemU[address,ebytes];

主要是这一行

if (address MOD alignment) != 0 then GenerateAlignmentException();

我实际上无法理解为什么 CPU 可以自行检查对齐并应用最佳条件.可能会花费太多的周期.

I have a question about ARM Neon VLD1 instruction's alignment. How does the alignment in the following code work?

DATA            .req r0  
vld1.16         {d16, d17, d18, d19}, [DATA, :128]!  

Does the starting address of this read instruction shifts to DATA + a positive integer, such that it is the smallest multiple of 16(16 bytes = 128 bits) which is no less than DATA, or DATA itself changes to the smallest multiple of 16 no less than DATA?

解决方案

It is a hint to the CPU. Only thing I read about the usefulness of such hint was from a blog post on ARM's site claiming it makes the loading faster, it doesn't say how or why however. Probably because CPU can issue wider loads.

You can also specify an alignment for the pointer passed in Rn, using the optional : parameter, which often speeds up memory accesses.

If you provide the hint you must make sure that DATA is aligned to 16 bytes otherwise you'll get an hardware exception.

This hardware behavior is described in VLD1 description in ARM ARM as

if ConditionPassed() then
    EncodingSpecificOperations(); CheckAdvSIMDEnabled(); NullCheckIfThumbEE(n);
    address = R[n]; if (address MOD alignment) != 0 then GenerateAlignmentException();
    if wback then R[n] = R[n] + (if register_index then R[m] else ebytes);
    Elem[D[d],index,esize] = MemU[address,ebytes];

mainly this line

if (address MOD alignment) != 0 then GenerateAlignmentException();

I actually can't understand why CPU can check alignment itself and apply the best condition. May be that would cost too much cycles.

这篇关于VLD1 中的对齐的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆