内存层次结构延迟信息 [英] Memory hierarchy latency information

查看:202
本文介绍了内存层次结构延迟信息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此帖子的示例"部分中,作者列出了所有内存组件register/L1/L2/RAM的延迟...我的问题是:如何测量(在线查找)任何给定芯片的实际延迟?假设

In the "Example" section of this post, the author lists the latencies of all memory components register/L1/L2/RAM... My question is: how do you measure (find online) what the real latencies are for any given chip? Let's say

model name  : Intel(R) Core(TM)2 Duo CPU     E4600  @ 2.40GHz
stepping    : 13
cpu MHz     : 1200.000

我也尝试过从《英特尔手册》中提取信息,但是对于我一生来说,这些事情非常巨大,我不知道从哪里寻找信息.

I've tried digging up the information from the Intel Manuals as well, but for the life of me, those things are huge, I wouldn't know where to look for the information.

谢谢.

推荐答案

简单的测量缓存和内存延迟以及CPU到内存的带宽.本文作者使用 LMbench 进行测量.

A simple google query ("intel cpu cache latency") reveals an interesting research of Intel: Measuring Cache and Memory Latency and CPU to Memory Bandwidth. In this paper authors use LMbench to perform the measurements.

如何进行测量

使用名为"lat_mem_rd"的可执行二进制文件 可以在实用程序目录的"bin"文件夹中找到.接下来,使用以下 命令行:

How to take Measurements

Use the executable binary file called "lat_mem_rd" found in the "bin" folder of the utility’s directory. Next, use the following command line:

taskset 0x1 ./lat_mem_rd –N [x] –P [y] [depth] [stride]

其中[x]等于报告之前流程运行的次数 潜伏.通常,将其设置为"1"就足以进行准确的测量. 对于"-P"选项,[y]等于为运行 基准.建议始终为"1". 仅使用一个处理核心或线程来衡量访问延迟.这 [depth]规范指示实用程序将测量到内存的距离. 为了确保准确的测量,请指定将要使用的数量 远远超出缓存,因此不考虑延迟 测量.

Where [x] equals the number of times the process is run before reporting latency. Typically setting this to ‘1’ is sufficient for accurate measurements. For the ‘-P’ option, [y] equals the number of processes invoked to run the benchmark. The recommendation for this is always ‘1.’ It is sufficient to measure the access latency with only one processing core or thread. The [depth] specification indicates how far into memory the utility will measure. In order to ensure an accurate measurement, specify an amount that will go far enough beyond the cache so that it does not factor in latency measurements.

由于L1和L2缓存延迟与核心时钟有关,因此CPU频率在如何 快速的内存访问是实时发生的.这意味着核心数 时钟保持不变,与核心频率无关.对于可比 结果,最好将LMBench给定的延迟从纳秒转换为 进入CPU时钟.为此,请将延迟乘以处理器频率.

Since L1 and L2 cache latency ties to the core clock, CPU frequency plays a role in how fast memory accesses happen in real time. This means the number of core clocks stays the same independent of the core frequency. For a comparable result, it is best to convert the latency given by LMBench from nanoseconds into CPU clocks. To do this, multiply the latency by the processor frequency.

Time(seconds) * Frequency(Hz) = Clocks of latency

因此,如果2.4 GHz处理器花费17 ns来访问特定级别的 缓存,它将转换为:

Therefore, if a 2.4 GHz processor takes 17 ns to access a certain level of cache, this converts to:

17 x 10-18 seconds * 2400000000 Hz = 17 ns * 2.4 GHz ≈ 41 Clocks

这篇关于内存层次结构延迟信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆