GHC的垃圾收集RTS选项 [英] GHC's RTS options for garbage collection

查看:105
本文介绍了GHC的垃圾收集RTS选项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Haskell程序处理文本文件并生成一个 Map (有几百万个元素)。整个事情可以运行2-3分钟。我发现调整-H和-A选项对运行时间有很大的影响。



有关RTS的这种功能的文档,但这对我来说很难读,因为我不了解GC理论的算法和术语。我正在寻找一个技术性较差的解释,最好是针对Haskell / GHC。是否有任何关于为这些选项选择合理值的参考?

编辑:
这是代码,它为给定的单词列表建立一个句号。 p>

  buildTrie :: [B.ByteString]  - > MyDFA 
buildTrie l = fst3 $ foldl'step(emptyDFA,B.empty,1)$ sort $ map B.reverse l where
step ::(MyDFA,B.ByteString,Int) - > B.ByteString - > (MyDFA,B.ByteString,Int)
step(dfa,lastWord,newIndex)newWord =(insertNewStates,newWord,newIndex + B.length newSuffix)其中
(pref,lastSuffix,newSuffix)= splitPrefix lastWord newWord
branchPoint = transStar dfa pref

- newSuffix路径的新状态标签
newStates = [newIndex .. newIndex + B.length newSuffix - 1]
--insert newStates
insertNewStates =(foldl'(flip insertTransition)dfa $ zip3(branchPoint:init newStates)(B.unpack newSuffix)newStates)

权衡如下:程序分配内存,直到达到某个限制(由GC的自动调整参数决定,或通过RTS选项显式确定)。达到限制时,GC跟踪程序当前正在使用的所有数据,并回收不再需要的数据所使用的所有内存。您可以延迟此过程的时间越长,在此期间越多的数据将变得无法访问(死亡),因此GC可以避免跟踪该数据。推迟GC的唯一方法是让更多的内存可用于分配;因此更多的内存等于更少的GC,等于更低的GC开销。粗略地说,GHC的-H选项可以让您设置GC使用的内存量的下限,因此可以降低GC开销。

GHC使用 GC ,这是基本方案的一种优化,其中堆分为两代或更多代。对象被分配到年轻一代,并且足够长的那些被提升到旧一代(在2代设置中)。年轻一代的收集比老一代要频繁得多,这个想法是大多数物体年轻,所以年轻一代的收藏便宜(他们没有追踪太多数据),但是它们回收了大量的记忆。粗略地说,-A选项设置年轻一代的规模 - 也就是说,在收集年轻一代之前将分配的内存量。

-A的缺省值为512k:将年轻一代保持在二级缓存之下是一个不错的主意,如果超过L2缓存大小,性能通常会下降。但是在相反的方向上工作是GC空间/时间的折中:通过减少GC的工作量,使用非常大的年轻一代的规模可能会超过缓存的好处。这并不总是会发生,这取决于应用程序的动态性,这使GC很难自动调整自己。 -H选项也增加了年轻一代的规模,因此也可能对缓存使用产生不利影响。



底线是:玩弄选项,看看什么作品。如果您有足够的内存空间,您可以通过使用-A或-H来获得性能提升,但不一定。


I have a Haskell program which processes a text file and builds a Map (with several million elements). The whole thing can run for 2-3 minutes. I found that tweaking the -H and -A options makes a big difference in running time.

There is documentation about this functionality of the RTS, but it's a hard read for me since I don't know the algorithms and terms from GC theory. I'm looking for a less technical explanation, preferably specific to Haskell/GHC. Are there any references about choosing sensible values for these options?

EDIT: That's the code, it builds a trie for a given list of words.

buildTrie :: [B.ByteString] -> MyDFA 
buildTrie l = fst3 $ foldl' step (emptyDFA, B.empty, 1) $ sort $ map B.reverse l where
    step :: (MyDFA , B.ByteString, Int) -> B.ByteString -> (MyDFA , B.ByteString, Int)
    step (dfa, lastWord, newIndex) newWord = (insertNewStates, newWord, newIndex + B.length newSuffix) where            
        (pref, lastSuffix, newSuffix) = splitPrefix lastWord newWord
        branchPoint = transStar dfa pref

        --new state labels for the newSuffix path
        newStates = [newIndex .. newIndex + B.length newSuffix - 1]
        --insert newStates
        insertNewStates = (foldl' (flip insertTransition) dfa $ zip3 (branchPoint:init newStates) (B.unpack newSuffix) newStates)

解决方案

Generally speaking, garbage collection is a space/time tradeoff. Give the GC more space, and it will take less time. There are (many) other factors in play, cache in particular, but the space/time tradeoff is the most important one.

The tradeoff works like this: the program allocates memory until it reaches some limit (decided by the GC's automatic tuning parameters, or explicitly via RTS options). When the limit is reached, the GC traces all the data that the program is currently using, and reclaims all the memory used by data that is no longer required. The longer you can delay this process, the more data will have become unreachable ("dead") in the meantime, so the GC avoids tracing that data. The only way to delay a GC is to make more memory available to allocate into; hence more memory equals fewer GCs, equals lower GC overhead. Roughly speaking, GHC's -H option lets you set a lower bound on the amount of memory used by the GC, so can lower the GC overhead.

GHC uses a generational GC, which is an optimisation to the basic scheme, in which the heap is divided into two or more generations. Objects are allocated into the "young" generation, and the ones that live long enough get promoted into the "old" generation (in a 2-generation setup). The young generation is collected much more frequently than the old generation, the idea being that "most objects die young", so young-generation collections are cheap (they don't trace much data), but they reclaim a lot of memory. Roughly speaking, the -A option sets the size of the young generation - that is, the amount of memory that will be allocated before the young generation is collected.

The default for -A is 512k: it's a good idea to keep the young generation smaller than the L2 cache, and performance generally drops if you exceed the L2 cache size. But working in the opposite direction is the GC space/time tradeoff: using a very large young generation size might outweigh the cache benefits by reducing the amount of work the GC has to do. This doesn't always happen, it depends on the dynamics of the application, which makes it hard for the GC to automatically tune itself. The -H option also increases the young generation size, so can also have an adverse effect on cache usage.

The bottom line is: play around with the options and see what works. If you have plenty of memory to spare, you might well be able to get a performance increase by using either -A or -H, but not necessarily.

这篇关于GHC的垃圾收集RTS选项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆