通过降低关联性增强了Skylake L2缓存? [英] Skylake L2 cache enhanced by reducing associativity?

查看:111
本文介绍了通过降低关联性增强了Skylake L2缓存?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

英特尔的优化指南,第2.1.3节,其中列出了Skylake(重点是我的)中的缓存和内存子系统的一些增强功能:

In Intel's optimization guide, section 2.1.3, they list a number of enhancements to the caches and memory subsystem in Skylake (emphasis mine):


Skylake微体系结构的缓存层次结构具有以下
增强功能:

The cache hierarchy of the Skylake microarchitecture has the following enhancements:


  • 与前几代产品相比,缓存带宽更高。

  • 通过扩大的缓冲区同时处理更多的负载和存储。

  • 与Haswell微体系结构和早期版本中的一个相比,Processor可以并行执行两次页面遍历。

  • 页面拆分负载损失从上一代的100个周期降低到5个周期。

  • L3的写带宽从上一代线路的4个周期增加到每线路2个周期。

  • 支持CLFLUSHOPT指令来刷新ca che行并使用SFENCE管理刷新数据的内存顺序。

  • 降低了指定NULL指针的软件预取的性能损失。

  • L2关联性从8种方式变为4种方式。

  • Higher Cache bandwidth compared to previous generations.
  • Simultaneous handling of more loads and stores enabled by enlarged buffers.
  • Processor can do two page walks in parallel compared to one in Haswell microarchitecture and earlier generations.
  • Page split load penalty down from 100 cycles in previous generation to 5 cycles.
  • L3 write bandwidth increased from 4 cycles pe r line in previous generation to 2 per line.
  • Support for the CLFLUSHOPT instruction to flush ca che lines and manage memory ordering of flushed data using SFENCE.
  • Reduced performance penalty for a software prefetch that specifies a NULL pointer.
  • L2 associativity changed from 8 ways to 4 ways.

最后一个吸引了我的注意。减少方式数量会以哪种方式增强?就其本身而言,似乎很少的方法严格比更多的方法差。当然,我知道,可能有正当的工程学原因,为什么减少数量的方式可能是折衷方案,可以实现其他增强功能,但是在这里,它本身就是一种增强功能。

The final one caught my eye. In what way is a reduction in the number of ways an enhancement? By itself, it seems that fewer ways is strictly worse than more ways. Of course, I get that there might be valid engineering reasons why a reduction in the number of ways could be a tradeoff that enables other enhancements, but here it is positioned, by itself, as an enhancement.

我缺少什么?

推荐答案

对于性能而言,这绝对是更糟糕的L2缓存的大小。

It's strictly worse for performance of the L2 cache.

根据 SKL-SP(又名skylake-avx512或SKL-X)的AnandTech文章,英特尔拥有指出 [降低关联性的主要原因是使设计更具模块化。 Skylake-AVX512具有1MiB的二级缓存,具有16向关联。

According to this AnandTech writeup of SKL-SP (aka skylake-avx512 or SKL-X), Intel has stated that "the main reason [for reducing associativity] was to make the design more modular". Skylake-AVX512 has 1MiB of L2 cache with 16-way associativity.

大概降到4向关联不会严重损害 在三核和四核笔记本电脑和台式机部件(SKL-S)中,因为三级缓存有很多带宽。我认为,如果英特尔的仿真和测试发现它造成了很大的伤害,那么他们本来会花费额外的设计时间来在非AVX512 Skylake上保留8路256k缓存。

Presumably the drop to 4-way associativity doesn't hurt too badly in the dual and quad-core laptop and desktop parts (SKL-S), since there's lots of bandwidth to L3 cache. I think if Intel's simulations and testing had found that it hurt a lot, they would have put in the extra design time to keep the 8-way 256k cache on non-AVX512 Skylake.

较低的关联性是功耗预算。它可以通过允许更多的涡轮余量来间接地提高性能,但是大多数情况下,这样做是为了提高效率,而不是提高速度。 在电力预算中释放一些空间,使他们可以将其用于其他地方。还是不花所有钱,只用更少的电量。

The upside of lower associativity is power budget. It could indirectly help performance by allowing more turbo headroom, but mostly they did it to improve efficiency, NOT to improve speed. Freeing up some room in the power budget allows them to spend it elsewhere. Or not to spend all of it, and just use less power.

移动和多核服务器CPU非常关注功耗预算,而不仅仅是高端产品四核台式机CPU。

Mobile and many-core-server CPUs care a lot about power budget, much more than high-end quad-core desktop CPUs.

列表中的标题应更准确地显示为更改,而不是增强 ,但我是确保营销部门不会让他们写听起来不好的东西。 :P至少英特尔要准确,详细地记录事情,包括新CPU比旧设计更糟糕的方式。

The heading on the list should more accurately read "changes", not "enhancements", but I'm sure the marketing department wouldn't let them write anything that didn't sound positive. :P At least Intel documents things accurately and in detail, including the ways new CPUs are worse than older designs.

< a href = http://www.anandtech.com/show/9582/intel-skylake-mobile-desktop-launch-architecture-analysis/5 rel = nofollow noreferrer> Anandtech的SKL文章表明降低关联性可以释放功率预算,从而增加L2带宽,这可以(总体上)弥补未命中率的增加。

Anandtech's SKL writeup suggests that dropping the associativity freed up the power budget to increase L2 bandwidth, which (in the big picture) compensates for the increased miss rate.

IIRC,英特尔有一项政策,即任何建议的设计更改必须使性能增益与电源成本之比达到2:1或类似的比率。因此,大概是如果他们因这次L2更改而损失了1%的性能却节省了3%的功耗,那么他们就可以做到。如果我没记错的话,2:1的数字可能是正确的,但是1%和3%的示例完全构成了。

IIRC, Intel has a policy that any proposed design change must have a 2:1 ratio of perf gain to power cost, or something like that. So presumably if they lost 1% performance but save 3% power with this L2 change, they do it. The 2:1 number might be correct, if I'm remembering this correctly, but the 1% and 3% example are totally made up.

关于在IDF发布详细信息后,David Kanter在一次播客采访中所做的这一更改。 IDK,如果这是正确的话链接

There was some discussion of this change in one of the podcast interviews David Kanter did right after details were released at IDF. IDK if this is the right link.

这篇关于通过降低关联性增强了Skylake L2缓存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆