现代CPU每滴答的缓存带宽 [英] Cache bandwidth per tick for modern CPUs

查看:99
本文介绍了现代CPU每滴答的缓存带宽的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

现代CPU的高速缓存访​​问速度是多少? Intel P4,Core2,Corei7,AMD在每个处理器时钟周期内可以从内存中读取或写入多少字节?

What is a speed of cache accessing for modern CPUs? How many bytes can be read or written from memory every processor clock tick by Intel P4, Core2, Corei7, AMD?

请回答两个理论值(宽度为ld / sd单位,其吞吐量(以uOP / tick为单位)和实际数字(甚至包括memcpy速度测试或STREAM基准)(如果有)。

Please, answer with both theoretical (width of ld/sd unit with its throughput in uOPs/tick) and practical numbers (even memcpy speed tests, or STREAM benchmark), if any.

PS这是个问题,与最大值汇编程序中的加载/存储指令速率。可以有理论上的加载速度(每笔交易的所有指令都是最宽的加载),但是处理器只能给出其中的一部分,即实际的加载限制。

PS it is question, related to maximal rate of load/store instructions in assembler. There can be theoretical rate of loading (all Instructions Per Tick are widest loads), but processor can give only part of such, a practical limit of loading.

推荐答案

对于nehalem:rolfed.com/nehalem/nehalemPaper.pdf

For nehalem: rolfed.com/nehalem/nehalemPaper.pdf

Each core in the architecture has a 128-bit write port and a
128-bit read port to the L1 cache. 

128位= 16字节/时钟读取

128位= 16字节/时钟写入
(我可以在一个周期内组合读写吗?)

128 bit = 16 bytes / clock read AND 128 bit = 16 bytes / clock write (can I combine read and write in single cycle?)

The L2 and L3 caches each have a 256-bit port for reading or writing, 
but the L3 cache must share its port with three other cores on the chip.

L2和L3读写端口可以在单个时钟中使用吗?

Can L2 and L3 read and write ports be used in single clock?

Each integrated memory controller has a theoretical bandwidth
peak of 32 Gbps.

延迟(时钟滴答声),某些由CPU-Z的

Latency (clock ticks), some measured by CPU-Z's latencytool or by lmbench's lat_mem_rd - both uses long linked list walk to correctly measure modern out-of-order cores like Intel Core i7

           L1     L2     L3, cycles;   mem             link
Core 2      3     15     --           66 ns           http://www.anandtech.com/show/2542/5
Core i7-xxx 4     11     39          40c+67ns         http://www.anandtech.com/show/2542/5
Itanium     1     5-6    12-17       130-1000 (cycles)
Itanium2    2     6-10   20          35c+160ns        http://www.7-cpu.com/cpu/Itanium2.html
AMD K8            12                 40-70c +64ns     http://www.anandtech.com/show/2139/3
Intel P4    2     19     43          200-210 (cycles) http://www.arsc.edu/files/arsc/phys693_lectures/Performance_I_Arch.pdf
AthlonXP 3k 3     20                 180 (cycles)     --//--
AthlonFX-51 3     13                 125 (cycles)     --//--
POWER4      4     12-20  ??          hundreds cycles  --//--
Haswell     4     11-12  36          36c+57ns         http://www.realworldtech.com/haswell-cpu/5/    

延迟数据的好来源是 7cpu网站-站点,例如for Haswell: http://www.7-cpu.com/cpu/Haswell.html

And good source on latency data is 7cpu web-site, e.g. for Haswell: http://www.7-cpu.com/cpu/Haswell.html

有关lat_mem_rd程序的更多信息,请参见其手册页在SO上

More about lat_mem_rd program is in its man page or here on SO.

这篇关于现代CPU每滴答的缓存带宽的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆