GHC 用于垃圾收集的 RTS 选项 [英] GHC's RTS options for garbage collection

查看:15
本文介绍了GHC 用于垃圾收集的 RTS 选项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 Haskell 程序,它处理一个文本文件并构建一个 Map(包含数百万个元素).整个过程可以运行2-3分钟.我发现调整 -H 和 -A 选项会对运行时间产生很大影响.

I have a Haskell program which processes a text file and builds a Map (with several million elements). The whole thing can run for 2-3 minutes. I found that tweaking the -H and -A options makes a big difference in running time.

有关 RTS 功能的文档,但对我来说很难阅读,因为我不了解 GC 理论中的算法和术语.我正在寻找一个技术含量较低的解释,最好是针对 Haskell/GHC.有没有关于为这些选项选择合理值的参考资料?

There is documentation about this functionality of the RTS, but it's a hard read for me since I don't know the algorithms and terms from GC theory. I'm looking for a less technical explanation, preferably specific to Haskell/GHC. Are there any references about choosing sensible values for these options?

这就是代码,它为给定的单词列表构建一个 trie.

That's the code, it builds a trie for a given list of words.

buildTrie :: [B.ByteString] -> MyDFA 
buildTrie l = fst3 $ foldl' step (emptyDFA, B.empty, 1) $ sort $ map B.reverse l where
    step :: (MyDFA , B.ByteString, Int) -> B.ByteString -> (MyDFA , B.ByteString, Int)
    step (dfa, lastWord, newIndex) newWord = (insertNewStates, newWord, newIndex + B.length newSuffix) where            
        (pref, lastSuffix, newSuffix) = splitPrefix lastWord newWord
        branchPoint = transStar dfa pref

        --new state labels for the newSuffix path
        newStates = [newIndex .. newIndex + B.length newSuffix - 1]
        --insert newStates
        insertNewStates = (foldl' (flip insertTransition) dfa $ zip3 (branchPoint:init newStates) (B.unpack newSuffix) newStates)

推荐答案

一般来说,垃圾回收是空间/时间的权衡.给 GC 更多的空间,它会花费更少的时间.还有(许多)其他因素在起作用,尤其是缓存,但空间/时间的权衡是最重要的.

Generally speaking, garbage collection is a space/time tradeoff. Give the GC more space, and it will take less time. There are (many) other factors in play, cache in particular, but the space/time tradeoff is the most important one.

权衡如下:程序分配内存直到达到某个限制(由 GC 的自动调整参数决定,或通过 RTS 选项明确决定).当达到限制时,GC 会跟踪程序当前正在使用的所有数据,并回收不再需要的数据使用的所有内存.您可以延迟此过程的时间越长,在此期间无法访问(死")的数据就越多,因此 GC 会避免跟踪该数据.延迟 GC 的唯一方法是分配更多内存;因此更多的内存等于更少的 GC,等于更低的 GC 开销.粗略地说,GHC 的 -H 选项可以让您设置 GC 使用的内存量的下限,因此可以降低 GC 开销.

The tradeoff works like this: the program allocates memory until it reaches some limit (decided by the GC's automatic tuning parameters, or explicitly via RTS options). When the limit is reached, the GC traces all the data that the program is currently using, and reclaims all the memory used by data that is no longer required. The longer you can delay this process, the more data will have become unreachable ("dead") in the meantime, so the GC avoids tracing that data. The only way to delay a GC is to make more memory available to allocate into; hence more memory equals fewer GCs, equals lower GC overhead. Roughly speaking, GHC's -H option lets you set a lower bound on the amount of memory used by the GC, so can lower the GC overhead.

GHC 使用 generational GC,这是对基本方案的优化,其中将堆分为两代或更多代.对象被分配到年轻"代,而存活时间足够长的对象被提升到老"代(在 2 代设置中).年轻代比老年代更频繁地被收集,这个想法是大多数对象都会在年轻时死去",所以年轻代的收集很便宜(它们不会跟踪太多数据),但它们会回收大量内存.粗略地说,-A 选项设置了年轻代的大小——即在收集年轻代之前将分配的内存量.

GHC uses a generational GC, which is an optimisation to the basic scheme, in which the heap is divided into two or more generations. Objects are allocated into the "young" generation, and the ones that live long enough get promoted into the "old" generation (in a 2-generation setup). The young generation is collected much more frequently than the old generation, the idea being that "most objects die young", so young-generation collections are cheap (they don't trace much data), but they reclaim a lot of memory. Roughly speaking, the -A option sets the size of the young generation - that is, the amount of memory that will be allocated before the young generation is collected.

-A 的默认值为 512k:保持年轻代小于 L2 缓存是个好主意,如果超过 L2 缓存大小,性能通常会下降.但是在相反的方向工作是 GC 空间/时间的权衡:使用非常大的年轻代大小可能会通过减少 GC 必须做的工作量来超过缓存的好处.这并不总是发生,它取决于应用程序的动态,这使得 GC 很难自动调整自身.-H 选项还会增加年轻代的大小,因此也会对缓存的使用产生不利影响.

The default for -A is 512k: it's a good idea to keep the young generation smaller than the L2 cache, and performance generally drops if you exceed the L2 cache size. But working in the opposite direction is the GC space/time tradeoff: using a very large young generation size might outweigh the cache benefits by reducing the amount of work the GC has to do. This doesn't always happen, it depends on the dynamics of the application, which makes it hard for the GC to automatically tune itself. The -H option also increases the young generation size, so can also have an adverse effect on cache usage.

底线是:玩转这些选项,看看有什么用.如果您有足够的可用内存,则可以通过使用 -A 或 -H 来提高性能,但不一定.

The bottom line is: play around with the options and see what works. If you have plenty of memory to spare, you might well be able to get a performance increase by using either -A or -H, but not necessarily.

这篇关于GHC 用于垃圾收集的 RTS 选项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆