GHC垃圾收集器/运行时如何知道它可以创建一个`inplace'数组 [英] How does the GHC garbage collector / runtime know that it can create an array `inplace'

查看:119
本文介绍了GHC垃圾收集器/运行时如何知道它可以创建一个`inplace'数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

例如

  main = do 
let ls = [0..10000000]
print ls

这将使用O(1)内存创建数组'inplace'。



以下编辑会导致程序在执行时耗尽内存。

  main = do 
let ls = [0..10000000]
print ls
print ls


$在这种情况下,必须将b $ b

ls 保存在内存中以再次打印。实际上,重新计算'inplace'数组的效率要比试图保持这种效率更高。尽管如此,我真正的问题是GHC如何以及何时与运行时系统通信,在O(1)时间创建时, ls 可以被破坏?我了解活性分析可以找到这些信息,我只是想知道信息的用途。垃圾收集器是否传递了这个信息?它以某种方式被编辑掉了吗? (如果我查看GHC编译的核心,那么这两个示例都使用 eftInt ,所以如果它是编译器工件,那么它必须发生在更深层次上)。 b
$ b

编辑:我的问题更多的是关于这个优化发生的地方。我想也许是在GC中,这是从汇编步骤中的一些活性检查中提供的一些信息。由于迄今为止的答案,我可能是错的。这是最有可能发生在核心之前的一些较低的水平,所以可能吗?



edit2:这里的大多数答案假设GC知道第一个示例中不再引用> ls ,并在第二个示例中再次引用它。我知道GC的基本知识,并且我知道数组是链表,等等。我的问题正是 GC知道这一点。答案可能仅仅是(a)它从编译器获得额外的信息,或者(b)它不需要知道这一点,编译器100%处理这个信息

解决方案

这个解释可能是一个谎言,特别是因为我正在弥补,但这应该不成问题。
$ b

你所犯的重大错误是假设一个值绑定到一个活动表达式的范围内,这个值就是活的。这是完全错误的。绑定到一个变量的值只有在实际表达式中提到时才会生效。



运行时的工作是非常简单


  1. 执行绑定到 main 的表达式。

  2. 没有2.

我们可以将这个执行看作涉及几个不同的步骤,一遍又一遍地重复:


  1. 找出现在要做的事情。

  2. 找出接下来要做的事。
  3. li>

所以我们从一些 main 表达式开始。从一开始,GC的根集就包含那些在 main 表达式中使用的名称,不是中的内容范围在该表达式中。如果我写了

  foo =嗨! 
main = print再见!

那么既然 foo code> main ,它不在开头的根目录下,因为它甚至没有被 main

现在假设我们有一个更有趣的例子:

  foo =嗨! 
bar =再见!
main = print foo>>打印栏

现在 foo code> main ,所以它开始现场。我们评估 main 为弱头标准格式以找出要做的事情,我们得到大约



<$ p $ (> )(原始操作输出Hi!)>>打印栏

请注意, foo 不再是所以它已经死了!

现在我们执行那个原始操作,打印Hi!,并且我们的待办事项列表被减少到$ / $>

 打印栏

现在我们对WHNF进行评估,大致得到
$ b $ pre $ (原始操作打印再见!)

现在 bar 已死亡。我们打印再见!并退出。






现在考虑您描述的第一个程序:

  main = do 
let ls = [0..10000000]
print ls

这个desugars

  main = 
let ls = [0..10000000]
in print ls

这是我们开始的地方。开始处的根集是表达式的子句中的中提及的所有内容。所以我们在概念上有 ls print 来开始。现在我们可以设想 print ,专用于 [Integer] ,看起来有点像以下内容(这大大简化了,并且会打印出不同的列表,但这并不重要)。

  print xs = case xs of 
[] - > return()
(y:ys)= printInteger y>> print ys

所以当我们开始执行 main (我们现在做什么?之后我们会做什么?),我们正在计算 print ls 。为此,我们在 ls 的第一个构造函数上进行模式匹配,将 ls 强制评估为WHNF。我们发现第二个模式 y:ys ,匹配,因此我们用替换 print ls 打印Integer y>>打印ys ,其中 y 指向 ls 和<$ c $的第一个元素c> ys 指向代表 ls 的第二个列表构造函数的thunk。但请注意, ls 本身现在已经死了!没有什么是指向它!因此,当 print 强制列表中的位时,它已经通过的位已经失效。

你有

  main = 
让ls = ...
in print ls>>打印ls

然后开始执行,首先计算要做的事情( print ls )。你得到

 (printInteger y>> print ys)>> print ls 

除了表达式的第二部分现在指向 LS 。因此,尽管第一部分会逐渐放弃列表中的部分内容,但第二部分将继续保持开始,并保持全部生效。



编辑



让我试着解释一些比 IO 更简单的东西。假设你的程序是一个 [Int] 类型的表达式,运行时系统的工作是打印每一个元素在它自己的行上。所以我们可以写成:

  countup mn = if m == n then [] else m:countup(m + 1)
main = countup 0 1000

运行时系统拥有一个代表应该打印的值的值。我们称之为当前值 whatPrint 。 RTS需要遵循以下流程:


  1. whatPrint 设置为 main

  2. whatPrint 是空的吗?如果是这样,我就完成了,并且可以退出程序。如果没有,这是一个缺点, printNow:whatPrint'

  3. 计算 printNow
  4. whatPrint 设置为 whatPrint' li>
  5. 转到步骤1.

在此模型中,垃圾收集的root set只是 whatPrint



在一个真正的程序中,我们不会产生一个列表;我们产生一个 IO 动作。但是这样的行为也是一种懒惰的数据结构(概念上)。您可以考虑>> = return ,并且每个基元 IO 操作作为 IO 的构造函数。可以认为它是

  data IO :: *  - > *其中
Return :: a - > IO a
Bind :: IO a - > (a→IO b)→> IO b
PrintInt :: Int - > IO()
ReadInt :: IO Int
...

< whatShouldIDo 的初始值是 main ,但它的值会随着时间的推移而变化。只有它直接指向的是根集。没有必要进行神奇的分析。

For example

main = do
  let ls = [0..10000000]
  print ls

This will create the array 'inplace', using O(1) memory.

The following edit causes the program to run out of memory while executing.

main = do
  let ls = [0..10000000]
  print ls
  print ls

ls in this case must be kept in memory to be printed again. It would actually be heaps more memory efficient to recalculate the array again 'inplace' than to try to keep this in place. That's an aside though. My real question is "how and when does GHC communicate to the runtime system that ls can be destroyed while it's created in O(1) time?" I understand that liveness analysis can find this information, I'm just wondering where the information is used. Is it the garbage collector that is passed this info? Is it somehow compiled away? (If I look at the compiled core from GHC then both examples use eftInt, so if it's a compiler artifact then it must happen at a deeper level).

edit: My question was more about finding where this optimization took place. I thought maybe it was in the GC, which was fed some information from some liveness check in the compilation step. Due to the answers so far I'm probably wrong. This is most likely then happening at some lower level before core, so cmm perhaps?

edit2: Most of the answers here assume that the GC knows that ls is no longer referenced in the first example, and that it is referenced again in the second example. I know the basics of GC and I know that arrays are linked lists, etc. My question is exactly HOW the GC knows this. The answer could probably be only (a) it is getting extra information from the compiler, or (b) it doesn't need to know this, that this information is handled 100% by the compiler

解决方案

This explanation is probably a lie, especially because I'm making it up as I go, but that shouldn't be a problem.

The essential mistake you're making is assuming that a value is live if a variable bound to it is in scope in a live expression. This is simply wrong. A value bound to a variable is only live as a result if it is actually mentioned in a live expression.

The job of the runtime is very simple

  1. Execute the expression bound to main.
  2. There is no 2.

We can think of this execution as involving a couple different steps that repeat over and over:

  1. Figure out what to do now.
  2. Figure out what to do next.

So we start with some main expression. From the start, the "root set" for GC consists of those names that are used in that main expression, not the things that are in scope in that expression. If I write

foo = "Hi!"
main = print "Bye!"

then since foo is not mentioned in main, it is not in the root set at the beginning, and since it is not even mentioned indirectly by anything mentioned by main, it is dead right from the start.

Now suppose we take a more interesting example:

foo = "Hi!"
bar = "Bye!"
main = print foo >> print bar

Now foo is mentioned in main, so it starts out live. We evaluate main to weak head normal form to find out what to do, and we get, approximately,

(primitive operation that prints out "Hi!") >> print bar

Note that foo is no longer mentioned, so it is dead!

Now we execute that primitive operation, printing "Hi!", and our "to do list" is reduced to

print bar

Now we evaluate that to WHNF, and get, roughly,

(primitive operation to print "Bye!")

Now bar is dead. We print "Bye!" and exit.


Consider, now, the first program you described:

main = do
  let ls = [0..10000000]
  print ls

This desugars to

main =
  let ls = [0..10000000]
  in print ls

This is where we start. The "root set" at the beginning is everything mentioned in the in clause of the expression. So we conceptually have ls and print to start out. Now we can imagine that print, specialized to [Integer], looks something vaguely like the following (this is greatly simplified, and will print out the list differently, but that really doesn't matter).

print xs = case xs of
  [] -> return ()
  (y:ys) = printInteger y >> print ys

So when we start executing main (What do we do now? What will we do afterwards?), we are trying to calculate print ls. To do this, we pattern match on the first constructor of ls, which forces ls to be evaluated to WHNF. We find the second pattern, y:ys, matches, so we replace print ls with print Integer y >> print ys, where y points to the first element of ls and ys points to the thunk representing the second list constructor of ls. But note that ls itself is now dead! Nothing is pointing to it! So as print forces bits of the list, the bits it has already passed become dead.

In contrast, when you have

main =
  let ls = ...
  in print ls >> print ls

and you start executing, you start by calculating the thing to do first (print ls). You get

(printInteger y >> print ys) >> print ls

Everything is the same, except that the second part of the expression now points to ls. So even though the first part will be dropping pieces of the list as it goes, the second part will keep holding on to the beginning, keeping it all live.

Edit

Let me try explaining with something a little simpler than IO. Pretend that your program is an expression of type [Int], and the job of the runtime system is to print each element on its own line. So we can write

countup m n = if m == n then [] else m : countup (m+1)
main = countup 0 1000

The runtime system holds a value representing everything that it should print. Let's call the "current value" whatPrint. The RTS needs to follow a process:

  1. Set whatPrint to main.
  2. Is whatPrint empty? If so, I'm done, and can exit the program. If not, it is a cons, printNow : whatPrint'.
  3. Calculate printNow and print it.
  4. Set whatPrint to whatPrint'
  5. Go to step 1.

In this model, the "root set" for garbage collection is just whatPrint.

In a real program, we don't produce a list; we produce an IO action. But such an action is also a lazy data structure (conceptually). You can think of >>=, return, and each primitive IO operation as a constructor for IO. Think of it as

data IO :: * -> * where
  Return :: a -> IO a
  Bind :: IO a -> (a -> IO b) -> IO b
  PrintInt :: Int -> IO ()
  ReadInt :: IO Int
  ...

The initial value of whatShouldIDo is main, but its value evolves over time. Only what it points to directly is in the root set. There is no magical analysis necessary.

这篇关于GHC垃圾收集器/运行时如何知道它可以创建一个`inplace'数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆