可寻址与对齐 [英] Addressable vs Alignment

查看:56
本文介绍了可寻址与对齐的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题与记忆有关?

  1. 对齐和可寻址之间有什么区别? 如果存储器是字节可寻址的并且字对齐的,那么我们不能直接拥有字可寻址的存储器吗?

  1. What is the difference between alignment and addressable? If a memory is byte addressable and word aligned then can't we directly have a word addressable memory?

另外,当我们说我们进行块传输时,这意味着什么.就像块的大小大于字(数据总线)一样,那么这意味着传输需要一个以上的周期.

Also when we say we have a block transfer what does it mean. As in if the size of the block is greater than word (databus),then does that mean it takes more than one cycle to transfer.

推荐答案

简短答案

可寻址性是指可以访问的最小内存单元,而对齐则涉及对更大的内存组(通常称为单词)的访问.

Short answer

Addressability refers to the smallest unit of memory that can be accessed, while alignment relates to accesses to larger groupings of memory (typically called a word).

最好用一个例子来回答这个问题.想象一下一个具有4个字节字的字节可寻址体系结构.对于此示例,我们将仅考虑负载,并假设有LB(负载字节)和LW(4字节负载字)两种类型.

This question is probably best answered with an example. Imagine a byte-addressable architecture that has 4 byte words. For this example we will only consider loads and assume there are two types LB (load byte) and LW (load word of 4 bytes).

首先考虑一个LB操作.在这种情况下,CPU实际上从高速缓存访​​问4个字节,然后将输出移位以选择所需的字节.因此,LB 0x10LB 0x11各自从0x10到​​0x13访问4个字节的内存.此访问的对齐方式并不重要.

First consider an LB operation. In this case the CPU actually accesses 4-bytes from the cache, and then shifts the output to pick the desired byte. So LB 0x10 or LB 0x11 each access 4 bytes of memory from 0x10 to 0x13. It doesn't really matter what the alignment of this access is.

接下来考虑对齐的LW操作. LW从内存中读取4个字节.因此LW 0x10从地址0x10到​​0x13读取4个字节.可以像LB操作一样通过单个操作完成.

Next consider an aligned LW operation. The LW reads 4 bytes from memory. So LW 0x10 reads the 4 bytes from address 0x10 to 0x13. This can be done as a single operation just like the LB operation.

但是LW 0x11操作将是未对齐的访问.实际上,它需要从0x11到0x14的数据,但是从缓存读取的数据分为4个字节块.因此,它将读取从0x10到​​0x13的4个字节,并且还必须执行另一个从0x14到0x17读取的访问.然后它将从这两次访问中选择所需的字节0x11至0x14.

However an LW 0x11 operation would be an unaligned access. It actually needs the data from 0x11 to 0x14, but the data that is read from the cache comes in 4 byte chunks. So it would read the 4 bytes from 0x10 to 0x13 and also have to perform another access that reads from 0x14 to 0x17. Then it would pick the desired bytes 0x11 to 0x14 from these two accesses.

有一些方法可以在微体系结构级别上优化未对齐的访问,以使未对齐的访问的开销不及两个对齐的访问的开销,但是未对齐的访问将始终需要CPU进行比对齐的访问更多的工作.结果,某些体系结构禁止未对齐的访问.即使对于禁止未对齐访问的体系结构,仍允许使用LB,因为LB操作仅需要来自对齐访问的字节的子集.而未对齐的访问需要来自两个对齐的访问的字节子集.其他体系结构允许未对齐的访问,以不限制程序员的灵活性,但他们仍然建议程序在可能的情况下执行大多数对齐的访问.

There are ways to optimize unaligned accesses at the microarchitecture level so that an unaligned access it not as costly as two aligned accesses, but an unaligned access will always require the CPU to do more work than an aligned access. As a result some architecture prohibit unaligned accesses. The LB is still allowed even for architectures that prohibit unaligned accesses, because the LB operation only needs a subset of the bytes from an aligned access. Whereas an unaligned accesses needs a subset of bytes from two aligned accesses. Other architectures allow unaligned accesses so as to not limit programmer flexibility, but they still recommend that programs perform mostly aligned accesses whenever possible.

对于您问题的第二部分,我不太确定您所说的块传输是什么意思.但是,如果您访问的数据超出了数据总线上的容量,那么是的,它将需要多个周期.

For the second part of your question I'm not exactly sure what you mean by block transfer. But if you are accessing more data than fits on the data bus then yes it will require more than a single cycle.

这篇关于可寻址与对齐的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆