在模拟中控制内存分配/ GC? [英] Controlling memory allocation/GC in a simulation?

查看:103
本文介绍了在模拟中控制内存分配/ GC?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在模拟在 State monad中运行的模拟内容时,我很难找出如何减少内存使用和GC时间。目前我必须使用 + RTS -K100M 来运行编译后的代码,以避免堆栈空间溢出,并且GC统计信息非常可怕(见下文)。



以下是相关的代码片段。完整的工作(GHC 7.4.1)代码可以在找到。

   - 独立代数数据类型保存模拟配置。 
data SimConfig = SimConfig {
numDimensions ::!Int - strict
,numWalkers ::!Int - strict
,simArray :: IntMap [Double] - 严格脊椎
,logP :: Seq Double - 严格脊椎
,logL :: Seq Double - 严格脊椎
,pairStream :: [(Int,Int)] - 懒惰(无限)列表随机值
,doubleStream :: [Double] - 随机值的懒惰(无限)列表
}派生Show

- 用于模拟的转换内核。
simKernel :: State SimConfig()
simKernel = do
config< - get
let arr = simArray config
let n = numWalkers config
let d = numDimensions config
let rstm0 = pairStream config
let rstm1 = doubleStream config
let lp = logP config
let ll = logL config

let( a,b)= head rstm0 - 使用随机流
让z0 = head。 map affineTransform $ take 1 rstm1 - 使用随机流
,其中affineTransform a = 0.5 *(a + 1)^ 2


let proposal = zipWith(+)r1 r2
其中r1 = map(* z0)$ fromJust(IntMap.lookup a arr)
r2 = map(*(1-z0))$ fromJust(IntMap.lookup b arr)

让logA = if val> 0 then 0 else val
where val = logP_proposal + logL_proposal - (lp`index`(a - 1)) - (ll`index`(a - 1))+((从整数n - 1)* log z0 )
logP_proposal = logPrior建议
logL_proposal = log可靠性建议

让cVal =(rstm1 !! 1)<= exp logA - 使用随机流

let newConfig = SimConfig {simArray = if cVal
then IntMap.update(\_ - > Just proposal)a arr
else arr
,numWalkers = n
, numDimensions = d
,pairStream = drop 1 rstm0
,doubleStream = drop 2 rstm1
,logP = if cVal
then Seq.update(a - 1)(logPrior proposal)lp
else lp
,logL = if cVal
then Seq.update(a - 1)(logLigelihood proposal)ll
else ll
}

put newConfig

main = do
- (有些东西省略)
let sim = logL $(`execState` initConfig)。 replicateM 100000 $ simKernel
print sim

就堆而言,配置文件似乎提示除(,)之外, System.Random 函数都是内存的罪魁祸首。我无法直接包含图片,但您可以在此处看到堆配置文件: http://i.imgur .com / 5LKxX.png



我不知道如何进一步减少这些东西的存在。随机变量在 State monad外生成(以避免在每次迭代时分裂生成器),并且我相信(,)内部 simKernel 出现在从延迟列表中( pairStream )拔取一对仿真配置。

包括GC在内的统计资料如下:

  1,220,911,360在堆中分配的字节
在GC
期间复制的787,192,920字节186,821,752字节最大居民地址(10个样本)
1,030,400字节最大地址
449 MB使用的总内存量(0 MB (经过)平均暂停最大暂停
第0代2159 colls,0 par 0.80s 0.81s 0.0004s 0.0283s
Gen 1 10 colls,0 par 0.80s 0.81s 0.0004s 0.0283s
Gen 1 10 colls, 0面值0.96s 1.09s 0.1094s 0.4354s

初始时间0.00s(经过0.00s)
MUT时间0.95s(经过0.97s)
GC时间1.76s(1.91 s经过)
退出时间0.00s(已过去0.00s)
总时间2.72s(已过时2.88s)

%GC时间64.9%(已过去66.2%)

分配给每个MUT的1,278,074,521字节第二个

生产率35。占用户总数的1%,占总用户总数的33.1%

同样,我必须提高最大值堆栈大小,以便运行模拟。我知道必须在某个地方建立一个庞大的堆栈......但我无法弄清楚在哪里?



如何改善堆栈/堆栈分配和GC像这样的问题?我怎样才能确定thunk可能正在建设的地方?在这里使用 State monad是否被误导了?



-



UPDATE:

我忽略了使用 -fprof-auto 。这是输出的头:

 成本中心模块no。项目%时间%分配%时间%分配

主要主要58 0 0.0 0.0 100.0 100.0
主要主要117 0 0.0 0.0 100.0 100.0
main.random主要147 1 62.0 55.5 62.0 55.5
main.arr Main 142 1 0.0 0.0 0.0 0.0
streamToAssocList Main 143 1 0.0 0.0 0.0 0.0
streamToAssocList.go Main 146 5 0.0 0.0 0.0 0.0
main.pairList Main 137 1 0.0 0.0 9.5 16.5
consPairStream Main 138 1 0.7 0.9 9.5 16.5
consPairStream.ys Main 140 1 4.3 7.8 4.3 7.8
consPairStream.xs Main 139 1 4.5 7.8 4.5 7.8
main.initConfig Main 122 1 0.0 0.0 0.0 0.0
logLikelihood Main 163 0 0.0 0.0 0.0 0.0
logPrior Main 161 5 0.0 0.0 0.0 0.0
main.sim Main 118 1 1.0 2.2 28.6 28.1
simKernel Main 120 0 4.8 5.1 27.6 25.8

我不确定如何正确解释这个,但随机双打的懒惰流 randomList ,让我变得如此。我不知道如何改进。

解决方案

我已经用一个工作示例更新了hpaste。它看起来像是罪魁祸首:




  • 缺少三个 SimConfig 字段中的严格标注: simArray logP logL



 
data SimConfig = SimConfig {
numDimensions ::!Int - strict
,numWalkers ::! Int - strict
,simArray ::!(IntMap [Double]) - 严格脊椎
,logP ::!(Seq Double) - 严格脊椎
,logL ::!( Seq Double) - 严格脊椎
,pairStream :: [(Int,Int)] - lazy
,doubleStream :: [Double] - lazy
}派生Show
<$ c $> $ b $>
  • newConfig 从未在 simKernel 循环是由于 State 为懒惰。另一种方法是使用严格的 State monad。

      put $! newConfig 


  • execState ... replicateM 也建立thunk。我最初用一个 foldl'替换了这个,并将 execState 移入了折叠对象,但我认为可以在 replicateM _ 是等价的并且更易读:

      let sim = logL $ execState replicateM_ epochs simKernel)initConfig 
    - sim = logL $ foldl'(const。execState simKernel)initConfig [1..epochs]




  • 一些对 mapM .. replicate 的调用已被替换为 replicateM 。在 consPairList 中特别值得注意的是它可以在很大程度上减少内存使用量。还有改进的空间,但最低的挂果要求不安全的InterleaveST ...所以我停了下来。



    我不知道输出结果是你想要的: p>

     
    fromList [-4.287033457733427,-1.8000404912760795,-5.581988678626085,-0.9362372340483293,-5.267791907985331]

    但这里是统计数据:

     
    在堆中分配的268,004,448字节
    复制70,753,952字节在GC
    16,014,224字节的最大居民地址(7个样本)
    1,372,456字节最大污水处理
    40 MB使用的总内存(由于碎片造成的0 MB丢失)

    总时间(已用)平均暂停最大暂停
    Gen 0 490 colls,0 par 0.05s 0.05s 0.0001s 0.0012s
    Gen 1 7 colls,0 par 0.04s 0.05s 0.0076s 0.0209s

    INIT时间0.00s(已过去0.00s)
    MUT时间0.12s (经过0.12s)
    GC时间0.09s(经过0.10s)
    EXIT时间0.00s(经过0.00s)
    总时间0.21s(经过0.22s)

    %GC时间42.2%(已用完45.1%)

    分配给每个MUT的2,241,514,569个字节第二个

    生产力总用户的57.8%,已用完总数的53.7%


    I'm having a bit of trouble figuring out how to reduce memory usage and GC time in a simulation running in the State monad. Presently I have to run the compiled code with +RTS -K100M to avoid stack space overflow, and the GC stats are pretty hideous (see below).

    Here are relevant snippets of the code. Complete, working (GHC 7.4.1) code can be found at http://hpaste.org/68527.

    -- Lone algebraic data type holding the simulation configuration.
    data SimConfig = SimConfig {
            numDimensions :: !Int            -- strict
        ,   numWalkers    :: !Int            -- strict
        ,   simArray      :: IntMap [Double] -- strict spine
        ,   logP          :: Seq Double      -- strict spine
        ,   logL          :: Seq Double      -- strict spine
        ,   pairStream    :: [(Int, Int)]    -- lazy (infinite) list of random vals
        ,   doubleStream  :: [Double]        -- lazy (infinite) list of random vals
        } deriving Show
    
    -- The transition kernel for the simulation.
    simKernel :: State SimConfig ()
    simKernel = do
        config <- get
        let arr   = simArray      config
        let n     = numWalkers    config
        let d     = numDimensions config
        let rstm0 = pairStream    config
        let rstm1 = doubleStream  config
        let lp    = logP          config
        let ll    = logL          config
    
        let (a, b)    = head rstm0                           -- uses random stream    
        let z0 = head . map affineTransform $ take 1 rstm1   -- uses random stream
                where affineTransform a = 0.5 * (a + 1) ^ 2
    
    
        let proposal  = zipWith (+) r1 r2
                where r1    = map (*z0)     $ fromJust (IntMap.lookup a arr)
                      r2    = map (*(1-z0)) $ fromJust (IntMap.lookup b arr)
    
        let logA = if val > 0 then 0 else val
                where val = logP_proposal + logL_proposal - (lp `index` (a - 1)) - (ll `index` (a - 1)) + ((fromIntegral n - 1) * log z0)
                      logP_proposal = logPrior proposal
                      logL_proposal = logLikelihood proposal
    
        let cVal       = (rstm1 !! 1) <= exp logA            -- uses random stream
    
        let newConfig = SimConfig { simArray = if   cVal
                                               then IntMap.update (\_ -> Just proposal) a arr
                                               else arr
                                  , numWalkers = n
                                  , numDimensions = d
                                  , pairStream   = drop 1 rstm0
                                  , doubleStream = drop 2 rstm1
                                  , logP = if   cVal
                                           then Seq.update (a - 1) (logPrior proposal) lp
                                           else lp
                                  , logL = if   cVal
                                           then Seq.update (a - 1) (logLikelihood proposal) ll
                                           else ll
                                  }
    
        put newConfig
    
    main = do 
        -- (some stuff omitted)
        let sim = logL $ (`execState` initConfig) . replicateM 100000 $ simKernel
        print sim
    

    In terms of the heap, a profile seems to cue that the System.Random functions, in addition to (,), are memory culprits. I can't include an image directly, but you can see a heap profile here: http://i.imgur.com/5LKxX.png.

    I have no idea how to reduce the presence of those things any further. The random variates are generated outside the State monad (to avoid splitting the generator on every iteration), and I believe the only instance of (,) inside simKernel arises when plucking a pair from the lazy list (pairStream) that is included in the simulation configuration.

    The stats, including GC, are as follows:

      1,220,911,360 bytes allocated in the heap
         787,192,920 bytes copied during GC
         186,821,752 bytes maximum residency (10 sample(s))
           1,030,400 bytes maximum slop
                 449 MB total memory in use (0 MB lost due to fragmentation)
    
                                        Tot time (elapsed)  Avg pause  Max pause
      Gen  0      2159 colls,     0 par    0.80s    0.81s     0.0004s    0.0283s
      Gen  1        10 colls,     0 par    0.96s    1.09s     0.1094s    0.4354s
    
      INIT    time    0.00s  (  0.00s elapsed)
      MUT     time    0.95s  (  0.97s elapsed)
      GC      time    1.76s  (  1.91s elapsed)
      EXIT    time    0.00s  (  0.00s elapsed)
      Total   time    2.72s  (  2.88s elapsed)
    
      %GC     time      64.9%  (66.2% elapsed)
    
      Alloc rate    1,278,074,521 bytes per MUT second
    
      Productivity  35.1% of total user, 33.1% of total elapsed
    

    And again, I have to bump up the maximum stack size in order to even run the simulation. I know there must be a big thunk building up somewhere.. but I can't figure out where?

    How can I improve the heap/stack allocation and GC in a problem like this? How can I identify where a thunk may be building up? Is the use of the State monad here misguided?

    --

    UPDATE:

    I neglected to look over the output of the profiler when compiling with -fprof-auto. Here is the head of that output:

    COST CENTRE                       MODULE                             no.     entries  %time %alloc   %time %alloc
    
    MAIN                              MAIN                                58           0    0.0    0.0   100.0  100.0
     main                             Main                               117           0    0.0    0.0   100.0  100.0
      main.randomList                 Main                               147           1   62.0   55.5    62.0   55.5
      main.arr                        Main                               142           1    0.0    0.0     0.0    0.0
       streamToAssocList              Main                               143           1    0.0    0.0     0.0    0.0
        streamToAssocList.go          Main                               146           5    0.0    0.0     0.0    0.0
      main.pairList                   Main                               137           1    0.0    0.0     9.5   16.5
       consPairStream                 Main                               138           1    0.7    0.9     9.5   16.5
        consPairStream.ys             Main                               140           1    4.3    7.8     4.3    7.8
        consPairStream.xs             Main                               139           1    4.5    7.8     4.5    7.8
      main.initConfig                 Main                               122           1    0.0    0.0     0.0    0.0
       logLikelihood                  Main                               163           0    0.0    0.0     0.0    0.0
       logPrior                       Main                               161           5    0.0    0.0     0.0    0.0
      main.sim                        Main                               118           1    1.0    2.2    28.6   28.1
       simKernel                      Main                               120           0    4.8    5.1    27.6   25.8 
    

    I'm not sure how to interpret this exactly, but the lazy stream of random doubles, randomList, makes me wince. I have no idea how that could be improved.

    解决方案

    I've updated the hpaste with a working example. It looks like the culprits are:

    • Missing strictness annotations in three SimConfig fields: simArray, logP and logL

        data SimConfig = SimConfig {
                numDimensions :: !Int            -- strict
            ,   numWalkers    :: !Int            -- strict
            ,   simArray      :: !(IntMap [Double]) -- strict spine
            ,   logP          :: !(Seq Double)      -- strict spine
            ,   logL          :: !(Seq Double)      -- strict spine
            ,   pairStream    :: [(Int, Int)]    -- lazy
            ,   doubleStream  :: [Double]        -- lazy 
            } deriving Show
    

    • newConfig was never evaluated in the simKernel loop due to State being lazy. Another alternative would be to use the strict State monad instead.

      put $! newConfig
      

    • execState ... replicateM also builds thunks. I originally replaced this with a foldl' and moved the execState into the fold, but I would think swapping in replicateM_ is equivalent and easier to read:

      let sim = logL $ execState (replicateM_ epochs simKernel) initConfig
      --  sim = logL $ foldl' (const . execState simKernel) initConfig [1..epochs]
      

    And a few calls to mapM .. replicate have been replaced with replicateM. Particularly noteworthy in consPairList where it reduces memory usage quite a bit. There is still room for improvement but the lowest hanging fruit involves unsafeInterleaveST... so I stopped.

    I have no idea if the output results are what you want:

    fromList [-4.287033457733427,-1.8000404912760795,-5.581988678626085,-0.9362372340483293,-5.267791907985331]
    

    But here are the stats:

         268,004,448 bytes allocated in the heap
          70,753,952 bytes copied during GC
          16,014,224 bytes maximum residency (7 sample(s))
           1,372,456 bytes maximum slop
                  40 MB total memory in use (0 MB lost due to fragmentation)
    
                                        Tot time (elapsed)  Avg pause  Max pause
      Gen  0       490 colls,     0 par    0.05s    0.05s     0.0001s    0.0012s
      Gen  1         7 colls,     0 par    0.04s    0.05s     0.0076s    0.0209s
    
      INIT    time    0.00s  (  0.00s elapsed)
      MUT     time    0.12s  (  0.12s elapsed)
      GC      time    0.09s  (  0.10s elapsed)
      EXIT    time    0.00s  (  0.00s elapsed)
      Total   time    0.21s  (  0.22s elapsed)
    
      %GC     time      42.2%  (45.1% elapsed)
    
      Alloc rate    2,241,514,569 bytes per MUT second
    
      Productivity  57.8% of total user, 53.7% of total elapsed
    

    这篇关于在模拟中控制内存分配/ GC?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆