关于 GHC 实施的好的介绍性文字? [英] Good introductory text about GHC implementation?

查看:18
本文介绍了关于 GHC 实施的好的介绍性文字?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 Haskell 中编程时(尤其是在解决 Project Euler 问题时,次优的解决方案往往会对 CPU 或内存需求造成压力),我经常对为什么程序的行为方式感到困惑.我查看配置文件,尝试引入一些严格性,选择另一种数据结构,......但大多是在黑暗中摸索,因为我缺乏良好的直觉.

另外,虽然我知道 Lisp、Prolog 和命令式语言的典型实现方式,但我不知道如何实现惰性语言.我也有点好奇.

因此,我想更多地了解从程序源到执行模型的整个链条.

我想知道的事情:

  • 应用了哪些典型的优化?

  • 当有多个评估候选者时,执行顺序是什么(虽然我知道它是由所需的输出驱动的,但先评估 A 再评估 B 或先评估 B 以检测可能仍然存在很大的性能差异你根本不需要A)

  • thunk 是如何表示的?

  • 栈和堆是如何使用的?

  • 什么是 CAF?(分析有时表明热点在那里,但我不知道)

解决方案

关于 GHC 系统的架构和方法的大部分技术信息都在他们的 wiki 中.我将链接到关键部分,以及一些人们可能不知道的相关论文.

应用了哪些典型优化?

这方面的关键论文是:.具体而言,请参阅thunk 的表示方式.

栈和堆是如何使用的?

Spineless Tagless G-machine 的设计,具体来说,自那篇论文发表以来进行了许多修改.概括地说,执行模型:

  • (装箱的)对象在全局堆上分配;
  • 每个线程对象都有一个栈,包括与堆对象具有相同布局的帧数;
  • 进行函数调用时,将值压入堆栈并跳转到函数;
  • 如果代码需要分配,例如构造函数,该数据被放置在堆上.

要深入了解堆栈使用模型,请参阅 Push/Enter 与 Eval/Apply".

什么是 CAF?

恒定应用形式".例如.为程序执行的整个生命周期分配的程序中的顶级常量.由于它们是静态分配的,因此它们必须是 由垃圾收集器特殊处理.

<小时>

参考资料和延伸阅读:

When programming in Haskell (and especially when solving Project Euler problems, where suboptimal solutions tend to stress the CPU or memory needs) I'm often puzzled why the program behaves the way it is. I look at profiles, try to introduce some strictness, chose another data structure, ... but mostly it's groping in the dark, because I lack a good intuition.

Also, while I know how Lisp, Prolog and imperative languages are typically implemented, I have no idea about implementing a lazy language. I'm a bit curious too.

Hence I would like to know more about the whole chain from program source to execution model.

Things I wonder about:

  • what typical optimizations are applied?

  • what is the execution order when there are multiple candidates for evaluation (while I know it's driven from the needed outputs, there may still be big performance differences between first evaluating A and then B, or evaluating B first to detect that you don't need A at all)

  • how are thunks represented?

  • how are the stack and the heap used?

  • what is a CAF? (profiling indicates sometimes that the hotspot is there, but I have no clue)

解决方案

The majority of the technical information about the architecture and approach of the GHC system is in their wiki. I'll link to the key pieces, and some related papers that people may not know about.

What typical optimizations are applied?

The key paper on this is: A transformation-based optimiser for Haskell, SL Peyton Jones and A Santos, 1998, which describes the model GHC uses of applying type-preserving transformations (refactorings) of a core Haskell-like language to improve time and memory use. This process is called "simplification".

Typical things that are done in a Haskell compiler include:

  • Inlining;
  • Beta reduction;
  • Dead code elimination;
  • Transformation of conditions: case-of-case, case elimiation.
  • Unboxing;
  • Constructed product return;
  • Full laziness transformation;
  • Specialization;
  • Eta expansion;
  • Lambda lifting;
  • Strictness analysis.

And sometimes:

  • The static argument transformation;
  • Build/foldr or stream fusion;
  • Common sub-expression elimination;
  • Constructor specialization.

The above-mentioned paper is the key place to start to understand most of these optimizations. Some of the simpler ones are given in the earlier book, Implementing Functional Languages, Simon Peyton Jones and David Lester.

What is the execution order when there are multiple candidates for evaluation

Assuming you're on a uni-processor, then the answer is "some order that the compiler picks statically based on heuristics, and the demand pattern of the program". If you're using speculative evaluation via sparks, then "some non-deterministic, out-of-order execution pattern".

In general, to see what the execution order is, look at the core, with, e.g. the ghc-core tool. An introduction to Core is in the RWH chapter on optimizations.

How are thunks represented?

Thunks are represented as heap-allocated data with a code pointer.

See the layout of heap objects. Specifically, see how thunks are represented.

How are the stack and the heap used?

As determined by the design of the Spineless Tagless G-machine, specifically, with many modifications since that paper was released. Broadly, the execution model:

  • (boxed) objects are allocated on the global heap;
  • every thread object has a stack, consisting of frames with the same layout as heap objects;
  • when you make a function call, you push values onto the stack and jump to the function;
  • if the code needs to allocate e.g. a constructor, that data is placed on the heap.

To deeply understand the stack use model, see "Push/Enter versus Eval/Apply".

What is a CAF?

A "Constant Applicative Form". E.g. a top level constant in your program allocated for the lifetime of your program's execution. Since they're allocated statically, they have to be treated specially by the garbage collector.


References and further reading:

这篇关于关于 GHC 实施的好的介绍性文字?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆