哈佛建筑是否有冯·诺依曼瓶颈? [英] Does the Harvard architecture have the von Neumann bottleneck?

查看:68
本文介绍了哈佛建筑是否有冯·诺依曼瓶颈?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从命名和此文章我认为答案是否定的,但是我不明白为什么.瓶颈在于从内存中获取数据的速度.是否可以同时获取指令似乎无关紧要.您是否还不必等到数据到达?假设获取数据需要100个cpu周期,而执行指令需要1个周期,那么提前执行1个周期的能力似乎并没有很大的提高.我在这里想念什么?

上下文:我遇到了这个解决方案

术语冯·诺伊曼瓶颈"并不是在谈论哈佛与冯·诺依曼的体系结构.谈论的是约翰·冯·诺依曼(John von Neumann)发明的存储程序计算机的整个概念.

它同样适用于两种类型的存储程序计算机.甚至是将功能保存在RAM中的固定功能(非存储程序)处理器.(没有可编程着色器的旧GPU基本上是固定功能,但访问数据仍然存在内存瓶颈.)

通常,在遍历大数组或基于指针的数据结构(例如链表)时,它最相关.因此,代码适合指令高速缓存,并且无论如何都不必在数据访问期间提取代码.(计算机太老了,甚至没有高速缓存,这简直就是慢速,而且我对争论即使在时间和/或空间位置上都存在慢速是否是冯·诺依曼瓶颈的语义学也不感兴趣.)

https://whatis.techtarget.com/definition/von-Neumann-bottleneck指出,缓存和预取是我们解决冯·诺依曼瓶颈的一部分,而更快/更宽的总线使瓶颈更宽.但是只有类似Memory-in-Memory的东西/ https://en.wikipedia.org/wiki/Computational_RAM 真正解决了此问题,因为ALU直接连接到存储单元,所以在计算和存储之间没有中央瓶颈,并且计算能力随存储大小而扩展.但是具有CPU和独立RAM的冯·诺依曼(von Neumann)可以很好地完成大多数事情,并且很快就不会消失(考虑到大型缓存和智能硬件预取,以及乱序执行和/或SMT来掩盖内存延迟)./p>


约翰·冯·诺伊曼(John von Neumann)是早期计算领域的先驱,毫不奇怪,他的名字附有两个不同的概念.

哈佛大学与冯·诺伊曼(von Neumann)之争是关于程序存储器是否在单独的地址空间(和单独的总线)中;这是存储程序计算机的实现细节.


Spectre:是的,Spectre与数据访问有关.如果您首先可以在哈佛体系结构中对程序内存进行Spectre攻击,那么它可以像在von Neumann上一样运行.

我知道投机执行对冯·诺依曼架构更有利,但是要多少呢?

什么?不.这里根本没有任何联系.当然,所有高性能现代CPU都是冯·诺依曼(von Neumann).(对于分离的L1i/L1d缓存,但是程序和数据存储器不是分开的,共享相同的地址空间和物理存储.分离的L1缓存通常被称为修改后的哈佛",这在x86以外的ISA上是有意义的,其中x1i不是L86.'与数据缓存不一致,因此在执行新存储的字节作为代码之前,您需要特殊的刷新指令.x86具有一致的指令缓存,因此,这是一个非常详细的实现细节.)

某些嵌入式CPU是真正的哈佛大学,程序存储器连接到Flash,数据地址空间映射到RAM.但是通常这些CPU的性能很低.管道化但有序,并且仅对指令预取使用分支预测.

但是,如果您 did 用完全独立的程序和数据存储器构建了一个非常高性能的CPU(因此从一个到另一个的复制将必须通过CPU),则基本上为零与现代高性能CPU不同. L1i缓存丢失很少见,它是否与数据获取竞争根本不重要.

不过,我想您会一直将高速缓存拆分开;通常,现代的CPU具有统一的L2和L3缓存,因此根据工作负载(是否有大代码量),L2和L3或多或少会最终保存代码.也许您仍然可以在标记中多加一点,以区分代码地址和数据地址.

From the naming and this article I feel the answer is no, but I don't understand why. The bottleneck is how fast you can fetch data from memory. Whether you can fetch instruction at the same time doesn't seem to matter. Don't you still have to wait until the data arrive? Suppose fetching data takes 100 cpu cycles and executing instruction takes 1, the ability of doing that 1 cycle in advance doesn't seem to be a huge improvement. What am I missing here?

Context: I came across this article saying the Spectre bug is not going to be fixed because of speculative execution. I think speculative execution, for example branch prediction, makes sense for Harvard architecture too. Am I right? I understand speculative execution is more beneficial for von Neumann architecture, but by how much? Can someone give a rough number? On what extent can we say the Spectre will stay because of von Neumann architecture?

解决方案

The term "von Neumann bottleneck" isn't talking about Harvard vs. von Neumann architectures. It's talking about the entire idea of stored-program computers, which John von Neumann invented.

It applies equally to both kinds of stored-program computers. And even to fixed-function (not stored-program) processors that keep data in RAM. (Old GPUs without programmable shaders are basically fixed-function but can still have memory bottlenecks accessing data).

Usually it's most relevant when looping over big arrays or pointer-based data structures like linked lists, so the code fits in an instruction cache and doesn't have to be fetched during data access anyway. (Computers too old to even have caches were just plain slow, and I'm not interested in arguing semantics of whether slowness even when there is temporal and/or spatial locality is a von Neumann bottleneck for them or not.)

https://whatis.techtarget.com/definition/von-Neumann-bottleneck points out that caching and prefetching is part of how we work around the von Neumann bottleneck, and that faster / wider busses make the bottleneck wider. But only stuff like Processor-in-Memory / https://en.wikipedia.org/wiki/Computational_RAM truly solves it, where an ALU is attached to memory cells directly, so there is no central bottleneck between computation and storage, and computational capacity scales with storage size. But von Neumann with a CPU and separate RAM works well enough for most things that it's not going away any time soon (given large caches and smart hardware prefetching, and out-of-order execution and/or SMT to hide memory latency.)


John von Neumann was a pioneer in early computing, and it's not surprising his name is attached to two different concepts.

Harvard vs. von Neumann is about whether program memory is in a separate address space (and a separate bus); that's an implementation detail for stored-program computers.


Spectre: yes, Spectre is just about data access. If you can get a Spectre attack into program memory in a Harvard architecture in the first place, then it can run the same as on a von Neumann.

I understand speculative execution is more beneficial for von Neumann architecture, but by how much?

What? No. There's no connection here at all. Of course, all high-performance modern CPUs are von Neumann. (With split L1i / L1d caches, but program and data memory are not separate, sharing the same address space and physical storage. Split L1 caches is often called "modified Harvard", which makes some sense on ISAs other than x86 where L1i isn't coherent with data caches so you need special flushing instructions before you can execute newly-stored bytes as code. x86 has coherent instruction caches, so it's very much an implementation detail.)

Some embedded CPUs are true Harvard, with program memory connected to Flash and data address space mapped to RAM. But often those CPUs are pretty low performance. Pipelined but in-order, and only using branch prediction for instruction prefetch.

But if you did build a very high performance CPU with fully separate program and data memories (so copying from one to the other would have to go through the CPU), there'd be basically zero different from modern high-performance CPUs. L1i cache misses are rare, and whether it competes with data fetch is not at all significant.

I guess you'd have split caches all the way down, though; normally modern CPUs have unified L2 and L3 caches, so depending on the workload (big code size or not) more or less of L2 and L3 can end up holding code. Maybe you'd still do that with one extra bit in the tag to distinguish code addresses from data addresses.

这篇关于哈佛建筑是否有冯·诺依曼瓶颈?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆