分析 Haskell 程序性能的工具 [英] Tools for analyzing performance of a Haskell program

查看:21
本文介绍了分析 Haskell 程序性能的工具的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在解决一些 Project Euler 问题以学习 Haskell 时(所以目前我是一个完全的初学者)我过来了 优化,所以我'会改写的使用 Data.Vector,如下所示:

numDivs n = fromIntegral $2 + (U.length $U.filter (x -> fromIntegral n `rem` x == 0) $(U.enumFromN 2 ((fromIntegral n `div` 2) + 1) :: U.Vector Int))

应该融合成一个循环,没有不必要的堆分配.也就是说,它将比列表版本具有更好的复杂性(通过常数因子).优化后可以使用ghc-core工具(高级用户)检查中间代码.

测试这个,ghc -O2 --make Z.hs

$ time ./Z749700./Z 3.73s 用户 0.01s 系统 99% cpu 3.753 总计

因此它将 N=150 的运行时间减少了 3.5 倍,而无需更改算法本身.

结论

您的问题是 numDivs.它是您运行时间的 100%,并且具有可怕的复杂性.想想 numDivs,例如,对于每个 N,您如何生成 [2 .. n div 2 + 1] N 次.尝试记住它,因为值不会改变.

要衡量哪个函数更快,请考虑使用 criterion,这将提供统计上的稳健性有关亚微秒级运行时间改进的信息.

<小时>

附录

由于 numDivs 是您运行时间的 100%,因此触及程序的其他部分不会有太大区别,然而,出于教学目的,我们也可以使用流融合来重写那些.

我们也可以重写trialList,依靠fusion把它变成你在trialList2中手写的循环,这是一个前缀扫描"功能(又名 scanl):

triaList = U.scanl (+) 0 (U.enumFrom 1 top)在哪里顶部 = 10^6

sol 类似:

sol :: Int ->整数sol n = U.head $ U.filter (x -> numDivs x > n) triaList

总体运行时间相同,但代码更简洁.

While solving some Project Euler Problems to learn Haskell (so currently I'm a completly beginner) I came over Problem 12. I wrote this (naive) solution:

--Get Number of Divisors of n
numDivs :: Integer -> Integer
numDivs n = toInteger $ length [ x | x<-[2.. ((n `quot` 2)+1)], n `rem` x == 0] + 2

--Generate a List of Triangular Values
triaList :: [Integer]
triaList =  [foldr (+) 0 [1..n] | n <- [1..]]

--The same recursive
triaList2 = go 0 1
  where go cs n = (cs+n):go (cs+n) (n+1)

--Finds the first triangular Value with more than n Divisors
sol :: Integer -> Integer
sol n = head $ filter (x -> numDivs(x)>n) triaList2

This Solution for n=500 (sol 500) is extremely slow (running for more than 2 hours now), so I wondered how to find out why this solution is so slow. Are there any commands that tell me where most of the computation-time is spent so I know which part of my haskell-program is slow? Something like a simple profiler.

To make it clear, I'm not asking for a faster solution but for a way to find this solution. How would you start if you would have no haskell knowledge?

I tried to write two triaList functions but found no way to test which one is faster, so this is where my problems start.

Thanks

解决方案

how to find out why this solution is so slow. Are there any commands that tell me where most of the computation-time is spend so I know which part of my haskell-program is slow?

Precisely! GHC provides many excellent tools, including:

A tutorial on using time and space profiling is part of Real World Haskell.

GC Statistics

Firstly, ensure you're compiling with ghc -O2. And you might make sure it is a modern GHC (e.g. GHC 6.12.x)

The first thing we can do is check that garbage collection isn't the problem. Run your program with +RTS -s

$ time ./A +RTS -s
./A +RTS -s 
749700
   9,961,432,992 bytes allocated in the heap
       2,463,072 bytes copied during GC
          29,200 bytes maximum residency (1 sample(s))
         187,336 bytes maximum slop
               **2 MB** total memory in use (0 MB lost due to fragmentation)

  Generation 0: 19002 collections,     0 parallel,  0.11s,  0.15s elapsed
  Generation 1:     1 collections,     0 parallel,  0.00s,  0.00s elapsed

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time   13.15s  ( 13.32s elapsed)
  GC    time    0.11s  (  0.15s elapsed)
  RP    time    0.00s  (  0.00s elapsed)
  PROF  time    0.00s  (  0.00s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time   13.26s  ( 13.47s elapsed)

  %GC time       **0.8%**  (1.1% elapsed)

  Alloc rate    757,764,753 bytes per MUT second

  Productivity  99.2% of total user, 97.6% of total elapsed

./A +RTS -s  13.26s user 0.05s system 98% cpu 13.479 total

Which already gives us a lot of information: you only have a 2M heap, and GC takes up 0.8% of time. So no need to worry that allocation is the problem.

Time Profiles

Getting a time profile for your program is straight forward: compile with -prof -auto-all

 $ ghc -O2 --make A.hs -prof -auto-all
 [1 of 1] Compiling Main             ( A.hs, A.o )
 Linking A ...

And, for N=200:

$ time ./A +RTS -p                   
749700
./A +RTS -p  13.23s user 0.06s system 98% cpu 13.547 total

which creates a file, A.prof, containing:

    Sun Jul 18 10:08 2010 Time and Allocation Profiling Report  (Final)

       A +RTS -p -RTS

    total time  =     13.18 secs   (659 ticks @ 20 ms)
    total alloc = 4,904,116,696 bytes  (excludes profiling overheads)

COST CENTRE          MODULE         %time %alloc

numDivs            Main         100.0  100.0

Indicating that all your time is spent in numDivs, and it is also the source of all your allocations.

Heap Profiles

You can also get a break down of those allocations, by running with +RTS -p -hy, which creates A.hp, which you can view by converting it to a postscript file (hp2ps -c A.hp), generating:

which tells us there's nothing wrong with your memory use: it is allocating in constant space.

So your problem is algorithmic complexity of numDivs:

toInteger $ length [ x | x<-[2.. ((n `quot` 2)+1)], n `rem` x == 0] + 2

Fix that, which is 100% of your running time, and everything else is easy.

Optimizations

This expression is a good candidate for the stream fusion optimization, so I'll rewrite it to use Data.Vector, like so:

numDivs n = fromIntegral $
    2 + (U.length $
        U.filter (x -> fromIntegral n `rem` x == 0) $
        (U.enumFromN 2 ((fromIntegral n `div` 2) + 1) :: U.Vector Int))

Which should fuse into a single loop with no unnecessary heap allocations. That is, it will have better complexity (by constant factors) than the list version. You can use the ghc-core tool (for advanced users) to inspect the intermediate code after optimization.

Testing this, ghc -O2 --make Z.hs

$ time ./Z     
749700
./Z  3.73s user 0.01s system 99% cpu 3.753 total

So it reduced running time for N=150 by 3.5x, without changing the algorithm itself.

Conclusion

Your problem is numDivs. It is 100% of your running time, and has terrible complexity. Think about numDivs, and how, for example, for each N you are generating [2 .. n div 2 + 1] N times. Try memoizing that, since the values don't change.

To measure which of your functions is faster, consider using criterion, which will provide statistically robust information about sub-microsecond improvements in running time.


Addenda

Since numDivs is 100% of your running time, touching other parts of the program won't make much difference, however, for pedagogical purposes, we can also rewrite those using stream fusion.

We can also rewrite trialList, and rely on fusion to turn it into the loop you write by hand in trialList2, which is a "prefix scan" function (aka scanl):

triaList = U.scanl (+) 0 (U.enumFrom 1 top)
    where
       top = 10^6

Similarly for sol:

sol :: Int -> Int
sol n = U.head $ U.filter (x -> numDivs x > n) triaList

With the same overall running time, but a bit cleaner code.

这篇关于分析 Haskell 程序性能的工具的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆