改进抓捕实施 [英] Improving treap implementation

查看：148 发布时间：2017/4/3 11:59:15 performance optimization data-structures haskell garbage-collection

本文介绍了改进抓捕实施的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是我实现一种treap（隐式键和节点中存储的一些其他信息）： http：/ /hpaste.org/42839/treap_with_implicit_keys

Here's my implementation of a sort of treap (with implicit keys and some additional information stored in nodes): http://hpaste.org/42839/treap_with_implicit_keys

根据剖析数据，GC需要80％的时间用于此程序。据我所知，这是由于每次修改节点时，重新创建到根的路径上的每个节点。

According to profiling data GC takes 80% of time for this program. As far as I understand, it's caused by the fact that every time a node is 'modified', each node on the path to the root is recreated.

有没有什么可以在这里提高性能，或者我必须下降到ST monad的领域？

Is there something I can do here to improve performance or I have to descend into the realm of ST monad?

推荐答案

使用GHC 7.0.3，我可以重现您的重GC流程：

Using GHC 7.0.3, I can reproduce your heavy GC behavior:

  $ time ./A +RTS -s
  %GC time      92.9%  (92.9% elapsed)
  ./A +RTS -s  7.24s user 0.04s system 99% cpu 7.301 total

我花了10分钟的时间来完成程序。这是我做的，按顺序：

I spent 10 minutes going through the program. Here's what I did, in order:

设置GHC的-H标志，增加GC的限制

检查解包

改进内联

调整第一代分配区域

Set GHC's -H flag, increasing limits in the GC
Check unpacking
Improve inlining
Adjust the first generation allocation area

导致10倍的加速，GC约为45％的时间。

Resulting in a 10 fold speedup, and GC around 45% of time.

按顺序，使用GHC的魔法 -H 标志，我们可以减少这个运行时：

In order, using GHC's magic -H flag, we can reduce that runtime quite a bit:

  $ time ./A +RTS -s -H
  %GC time      74.3%  (75.3% elapsed)
  ./A +RTS -s -H  2.34s user 0.04s system 99% cpu 2.392 total

不错！

Tree 节点上的UNPACK编译指示将不会执行任何操作，因此请删除它们。

The UNPACK pragmas on the Tree nodes won't do anything, so remove those.

内联更新更多的运行时：

 ./A +RTS -s -H  1.84s user 0.04s system 99% cpu 1.883 total

与内嵌 height

 ./A +RTS -s -H  1.74s user 0.03s system 99% cpu 1.777 total

所以while它是快速的，GC仍然是主导的 - 因为我们测试分配。
我们可以做的一件事是增加第一个gen大小：

So while it is fast, GC is still dominating -- since we're testing allocation, after all. One thing we can do is increase the first gen size:

 $ time ./A +RTS -s -A200M
 %GC time      45.1%  (40.5% elapsed)
 ./A +RTS -s -A200M  0.71s user 0.16s system 99% cpu 0.872 total

如JohnL所建议的那样，增加展开门槛有所帮助，

And increasing the unfolding threshold, as JohnL suggested, helps a little,

 ./A +RTS -s -A100M  0.74s user 0.09s system 99% cpu 0.826 total

这是什么，比我们开始快10倍？不错。

which is what, 10x faster than we started? Not bad.

使用 ghc-gc-tune ，您可以将运行时视为 -A 和 -H ，

Using ghc-gc-tune, you can see runtime as a function of -A and -H,

有趣的是，最佳运行时间使用非常大的 -A 值，例如

Interestingly, the best running times use very large -A values, e.g.

$ time ./A +RTS -A500M   
./A +RTS -A500M  0.49s user 0.28s system 99% cpu 0.776s

这篇关于改进抓捕实施的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

改进抓捕实施 [英] Improving treap implementation

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

改进抓捕实施 [英] Improving treap implementation

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭