什么是参考位置? [英] What is locality of reference?

查看:17
本文介绍了什么是参考位置?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在理解参考位置时遇到问题.任何人都可以帮助我理解它的含义和是什么,

I am having problem in understanding locality of reference. Can anyone please help me out in understanding what it means and what is,

  • 参考的空间局部性
  • 参考的时间局部性

推荐答案

如果您的计算机充满了超快内存,这无关紧要.

This would not matter if your computer was filled with super-fast memory.

但不幸的是,情况并非如此,计算机内存看起来像这样1:

But unfortunately that's not the case and computer-memory looks something like this1:

+----------+
| CPU      |  <<-- Our beloved CPU, superfast and always hungry for more data.
+----------+
|L1 - Cache|  <<-- ~4 CPU-cycles access latency (very fast), 2 loads/clock throughput
+----------+
|L2 - Cache|  <<-- ~12 CPU-cycles access latency (fast)
+----+-----+
     |
+----------+
|L3 - Cache|  <<-- ~35 CPU-cycles access latency (medium)
+----+-----+       (usually shared between CPU-cores)
     |
     |   <<-- This thin wire is the memory bus, it has limited bandwidth.
+----+-----+
| main-mem |  <<-- ~100 CPU-cycles access latency (slow)
+----+-----+  <<-- The main memory is big but slow (because we are cheap-skates)
     |
     |   <<-- Even slower wire to the harddisk
+----+-----+
| harddisk | <<-- Works at 0,001% of CPU speed
+----------+ 

空间局部性
在此图中,数据离 CPU 越近,CPU 获取数据的速度就越快.
这与空间局部性有关.如果数据在内存中靠得很近,则数据具有空间局部性.
因为 RAM 并不是真正的随机访问,所以它真的 慢如果随机,如果顺序访问则慢 SIRLSIAS-AM.DDR SDRAM 为一个读或写命令传输 32 或 64 字节的整个突发.
这就是为什么将相关数据放在一起很聪明,这样您就可以顺序读取一堆数据并节省时间.

Spatial Locality
In this diagram, the closer data is to the CPU the faster the CPU can get at it.
This is related to Spacial Locality. Data has spacial locality if it is located close together in memory.
Because of the cheap-skates that we are RAM is not really Random Access, it is really Slow if random, less slow if accessed sequentially Access Memory SIRLSIAS-AM. DDR SDRAM transfers a whole burst of 32 or 64 bytes for one read or write command.
That is why it is smart to keep related data close together, so you can do a sequential read of a bunch of data and save time.

时间局部性
数据保留在主内存中,但不能保留在缓存中,否则缓存将不再有用.缓存中只能找到最近使用的数据;旧数据被推出.
这与时间局部性有关.如果同时访问数据,则数据具有很强的时间局部性.
这很重要,因为如果项目 A 在缓存中(好),那么项目 B(对 A 具有很强的时间局部性)很可能也在缓存中.

Temporal locality
Data stays in main-memory, but it cannot stay in the cache, or the cache would stop being useful. Only the most recently used data can be found in the cache; old data gets pushed out.
This is related to temporal locality. Data has strong temporal locality if it is accessed at the same time.
This is important because if item A is in the cache (good) than Item B (with strong temporal locality to A) is very likely to also be in the cache.

脚注 1:

这是从各种 CPU 估计的延迟周期计数的简化示例,但给出您是典型 CPU 的正确数量级想法.

This is a simplification with latency cycle counts estimated from various cpus for example purposes, but give you the right order-of-magnitude idea for typical CPUs.

实际上,延迟和带宽是不同的因素,对于距离 CPU 越远的内存,延迟越难改善.但是在某些情况下,硬件预取和/或乱序执行可以隐藏延迟,例如遍历数组.由于访问模式不可预测,有效内存吞吐量可能远低于 L1d 缓存的 10%.

In reality latency and bandwidth are separate factors, with latency harder to improve for memory farther from the CPU. But HW prefetching and/or out-of-order exec can hide latency in some cases, like looping over an array. With unpredictable access patterns, effective memory throughput can be much lower than 10% of L1d cache.

例如,L2 缓存带宽不一定比 L1d 带宽差 3 倍.(但如果您使用 AVX SIMD 在 Haswell 或 Zen2 CPU 上从 L1d 每个时钟周期执行 2x 32 字节加载,则它会更低.)

For example, L2 cache bandwidth is not necessarily 3x worse than L1d bandwidth. (But it is lower if you're using AVX SIMD to do 2x 32-byte loads per clock cycle from L1d on a Haswell or Zen2 CPU.)

这个简化版本还忽略了 TLB 效应(页面粒度局部性)和 DRAM 页面局部性.(与虚拟内存页不同).要更深入地了解内存硬件和调优软件,请参阅 What Every Programmer应该了解内存吗?

This simplified version also leaves out TLB effects (page-granularity locality) and DRAM-page locality. (Not the same thing as virtual memory pages). For a much deeper dive into memory hardware and tuning software for it, see What Every Programmer Should Know About Memory?

相关:为什么在大多数处理器中 L1 缓存的大小小于 L2 缓存的大小? 解释了为什么需要多级缓存层次结构来获得我们想要的延迟/带宽和容量(和命中率).

Related: Why is the size of L1 cache smaller than that of the L2 cache in most of the processors? explains why a multi-level cache hierarchy is necessary to get the combination of latency/bandwidth and capacity (and hit-rate) we want.

一个巨大的快速 L1 数据缓存的功耗会令人望而却步,而且仍然无法像现代高性能 CPU 中的小型快速 L1d 缓存那样低延迟.

One huge fast L1-data cache would be prohibitively power-expensive, and still not even possible with as low latency as the small fast L1d cache in modern high-performance CPUs.

在多核 CPU 中,L1i/L1d 和 L2 缓存通常是每核私有缓存,具有共享的 L3 缓存.不同的内核必须相互竞争 L3 和内存带宽,但每个内核都有自己的 L1 和 L2 带宽.请参阅缓存如何如此之快?,了解来自双核 3GHz IvyBridge CPU:两个内核上的聚合 L1d 缓存读取带宽分别为 186 GB/s 和 9.6 GB/s DRAM 读取带宽,两个内核都处于活动状态.(因此,单核内存 = 10% L1d 是该代台式机 CPU 的一个很好的带宽估计,只有 128 位 SIMD 加载/存储数据路径).1.4 ns 的 L1d 延迟与 72 ns 的 DRAM 延迟

In multi-core CPUs, L1i/L1d and L2 cache are typically per-core private caches, with a shared L3 cache. Different cores have to compete with each other for L3 and memory bandwidth, but each have their own L1 and L2 bandwidth. See How can cache be that fast? for a benchmark result from a dual-core 3GHz IvyBridge CPU: aggregate L1d cache read bandwidth on both cores of 186 GB/s vs. 9.6 GB/s DRAM read bandwidth with both cores active. (So memory = 10% L1d for single-core is a good bandwidth estimate for desktop CPUs of that generation, with only 128-bit SIMD load/store data paths). And L1d latency of 1.4 ns vs. DRAM latency of 72 ns

这篇关于什么是参考位置?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆