在模拟中控制内存分配/ GC？ [英] Controlling memory allocation/GC in a simulation?

查看：103 发布时间：2018/4/19 18:14:35 haskell random garbage-collection simulation ghc

本文介绍了在模拟中控制内存分配/ GC？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在模拟在 State monad中运行的模拟内容时，我很难找出如何减少内存使用和GC时间。目前我必须使用 + RTS -K100M 来运行编译后的代码，以避免堆栈空间溢出，并且GC统计信息非常可怕（见下文）。

以下是相关的代码片段。完整的工作（GHC 7.4.1）代码可以在找到。

   - 独立代数数据类型保存模拟配置。 
 data SimConfig = SimConfig {
 numDimensions ::！Int  -  strict 
，numWalkers ::！Int  -  strict 
，simArray :: IntMap [Double]  - 严格脊椎
，logP :: Seq Double  - 严格脊椎
，logL :: Seq Double  - 严格脊椎
，pairStream :: [（Int，Int）]  - 懒惰（无限）列表随机值
，doubleStream :: [Double]  - 随机值的懒惰（无限）列表
}派生Show 
 
  - 用于模拟的转换内核。 
 simKernel :: State SimConfig（）
 simKernel = do 
 config<  -  get 
 let arr = simArray config 
 let n = numWalkers config 
 let d = numDimensions config 
 let rstm0 = pairStream config 
 let rstm1 = doubleStream config 
 let lp = logP config 
 let ll = logL config 
 
 let（ a，b）= head rstm0  - 使用随机流
让z0 = head。 map affineTransform $ take 1 rstm1  - 使用随机流
，其中affineTransform a = 0.5 *（a + 1）^ 2 
 
 
 let proposal = zipWith（+）r1 r2 
其中r1 = map（* z0）$ fromJust（IntMap.lookup a arr）
 r2 = map（*（1-z0））$ fromJust（IntMap.lookup b arr）
 
让logA = if val> 0 then 0 else val 
 where val = logP_proposal + logL_proposal  - （lp`index`（a  -  1）） - （ll`index`（a  -  1））+（（从整数n  -  1）* log z0 ）
 logP_proposal = logPrior建议
 logL_proposal = log可靠性建议
 
让cVal =（rstm1 !! 1）<= exp logA  - 使用随机流
 
 let newConfig = SimConfig {simArray = if cVal 
 then IntMap.update（\_  - > Just proposal）a arr 
 else arr 
，numWalkers = n 
， numDimensions = d 
，pairStream = drop 1 rstm0 
，doubleStream = drop 2 rstm1 
，logP = if cVal 
 then Seq.update（a  -  1）（logPrior proposal）lp 
 else lp 
，logL = if cVal 
 then Seq.update（a  -  1）（logLigelihood proposal）ll 
 else ll 
} 
 
 put newConfig 
 
 main = do 
  - （有些东西省略）
 let sim = logL $（`execState` initConfig）。 replicateM 100000 $ simKernel 
 print sim

就堆而言，配置文件似乎提示除（，）之外， System.Random 函数都是内存的罪魁祸首。我无法直接包含图片，但您可以在此处看到堆配置文件： http：//i.imgur .com / 5LKxX.png 。

我不知道如何进一步减少这些东西的存在。随机变量在 State monad外生成（以避免在每次迭代时分裂生成器），并且我相信（，）内部 simKernel 出现在从延迟列表中（ pairStream ）拔取一对仿真配置。

包括GC在内的统计资料如下：

  1,220,911,360在堆中分配的字节
在GC 
期间复制的787,192,920字节186,821,752字节最大居民地址（10个样本）
 1,030,400字节最大地址
 449 MB使用的总内存量（0 MB （经过）平均暂停最大暂停
第0代2159 colls，0 par 0.80s 0.81s 0.0004s 0.0283s 
 Gen 1 10 colls，0 par 0.80s 0.81s 0.0004s 0.0283s 
 Gen 1 10 colls， 0面值0.96s 1.09s 0.1094s 0.4354s 
 
初始时间0.00s（经过0.00s）
 MUT时间0.95s（经过0.97s）
 GC时间1.76s（1.91 s经过）
退出时间0.00s（已过去0.00s）
总时间2.72s（已过时2.88s）
 
％GC时间64.9％（已过去66.2％）
 
分配给每个MUT的1,278,074,521字节第二个
 
生产率35。占用户总数的1％，占总用户总数的33.1％

同样，我必须提高最大值堆栈大小，以便运行模拟。我知道必须在某个地方建立一个庞大的堆栈......但我无法弄清楚在哪里？

如何改善堆栈/堆栈分配和GC像这样的问题？我怎样才能确定thunk可能正在建设的地方？在这里使用 State monad是否被误导了？

-

UPDATE：

我忽略了使用-fprof-auto 。这是输出的头：
成本中心模块no。项目％时间％分配％时间％分配主要主要58 0 0.0 0.0 100.0 100.0 主要主要117 0 0.0 0.0 100.0 100.0 main.random主要147 1 62.0 55.5 62.0 55.5 main.arr Main 142 1 0.0 0.0 0.0 0.0 streamToAssocList Main 143 1 0.0 0.0 0.0 0.0 streamToAssocList.go Main 146 5 0.0 0.0 0.0 0.0 main.pairList Main 137 1 0.0 0.0 9.5 16.5 consPairStream Main 138 1 0.7 0.9 9.5 16.5 consPairStream.ys Main 140 1 4.3 7.8 4.3 7.8 consPairStream.xs Main 139 1 4.5 7.8 4.5 7.8 main.initConfig Main 122 1 0.0 0.0 0.0 0.0 logLikelihood Main 163 0 0.0 0.0 0.0 0.0 logPrior Main 161 5 0.0 0.0 0.0 0.0 main.sim Main 118 1 1.0 2.2 28.6 28.1 simKernel Main 120 0 4.8 5.1 27.6 25.8
我不确定如何正确解释这个，但随机双打的懒惰流 randomList ，让我变得如此。我不知道如何改进。
解决方案
我已经用一个工作示例更新了hpaste。它看起来像是罪魁祸首：
缺少三个 SimConfig 字段中的严格标注： simArray ， logP 和 logL data SimConfig = SimConfig { numDimensions ::！Int - strict ，numWalkers ::！ Int - strict ，simArray ::！（IntMap [Double]） - 严格脊椎，logP ::！（Seq Double） - 严格脊椎，logL ::！（ Seq Double） - 严格脊椎，pairStream :: [（Int，Int）] - lazy ，doubleStream :: [Double] - lazy }派生Show <$ c $> $ b $> newConfig 从未在 simKernel 循环是由于 State 为懒惰。另一种方法是使用严格的 State monad。 put $！ newConfig execState ... replicateM 也建立thunk。我最初用一个 foldl'替换了这个，并将 execState 移入了折叠对象，但我认为可以在 replicateM _ 是等价的并且更易读： let sim = logL $ execState replicateM_ epochs simKernel）initConfig - sim = logL $ foldl'（const。execState simKernel）initConfig [1..epochs] 一些对 mapM .. replicate 的调用已被替换为 replicateM 。在 consPairList 中特别值得注意的是它可以在很大程度上减少内存使用量。还有改进的空间，但最低的挂果要求不安全的InterleaveST ...所以我停了下来。我不知道输出结果是你想要的： p> fromList [-4.287033457733427，-1.8000404912760795，-5.581988678626085，-0.9362372340483293，-5.267791907985331] 但这里是统计数据：在堆中分配的268,004,448字节复制70,753,952字节在GC 16,014,224字节的最大居民地址（7个样本） 1,372,456字节最大污水处理 40 MB使用的总内存（由于碎片造成的0 MB丢失）总时间（已用）平均暂停最大暂停 Gen 0 490 colls，0 par 0.05s 0.05s 0.0001s 0.0012s Gen 1 7 colls，0 par 0.04s 0.05s 0.0076s 0.0209s INIT时间0.00s（已过去0.00s） MUT时间0.12s （经过0.12s） GC时间0.09s（经过0.10s） EXIT时间0.00s（经过0.00s）总时间0.21s（经过0.22s）％GC时间42.2％（已用完45.1％）分配给每个MUT的2,241,514,569个字节第二个生产力总用户的57.8％，已用完总数的53.7％ I'm having a bit of trouble figuring out how to reduce memory usage and GC time in a simulation running in the State monad. Presently I have to run the compiled code with +RTS -K100M to avoid stack space overflow, and the GC stats are pretty hideous (see below). Here are relevant snippets of the code. Complete, working (GHC 7.4.1) code can be found at http://hpaste.org/68527. -- Lone algebraic data type holding the simulation configuration. data SimConfig = SimConfig { numDimensions :: !Int -- strict , numWalkers :: !Int -- strict , simArray :: IntMap [Double] -- strict spine , logP :: Seq Double -- strict spine , logL :: Seq Double -- strict spine , pairStream :: [(Int, Int)] -- lazy (infinite) list of random vals , doubleStream :: [Double] -- lazy (infinite) list of random vals } deriving Show -- The transition kernel for the simulation. simKernel :: State SimConfig () simKernel = do config <- get let arr = simArray config let n = numWalkers config let d = numDimensions config let rstm0 = pairStream config let rstm1 = doubleStream config let lp = logP config let ll = logL config let (a, b) = head rstm0 -- uses random stream let z0 = head . map affineTransform $ take 1 rstm1 -- uses random stream where affineTransform a = 0.5 * (a + 1) ^ 2 let proposal = zipWith (+) r1 r2 where r1 = map (*z0) $ fromJust (IntMap.lookup a arr) r2 = map (*(1-z0)) $ fromJust (IntMap.lookup b arr) let logA = if val > 0 then 0 else val where val = logP_proposal + logL_proposal - (lp `index` (a - 1)) - (ll `index` (a - 1)) + ((fromIntegral n - 1) * log z0) logP_proposal = logPrior proposal logL_proposal = logLikelihood proposal let cVal = (rstm1 !! 1) <= exp logA -- uses random stream let newConfig = SimConfig { simArray = if cVal then IntMap.update (\_ -> Just proposal) a arr else arr , numWalkers = n , numDimensions = d , pairStream = drop 1 rstm0 , doubleStream = drop 2 rstm1 , logP = if cVal then Seq.update (a - 1) (logPrior proposal) lp else lp , logL = if cVal then Seq.update (a - 1) (logLikelihood proposal) ll else ll } put newConfig main = do -- (some stuff omitted) let sim = logL $ (`execState` initConfig) . replicateM 100000 $ simKernel print sim In terms of the heap, a profile seems to cue that the System.Random functions, in addition to (,), are memory culprits. I can't include an image directly, but you can see a heap profile here: http://i.imgur.com/5LKxX.png. I have no idea how to reduce the presence of those things any further. The random variates are generated outside the State monad (to avoid splitting the generator on every iteration), and I believe the only instance of (,) inside simKernel arises when plucking a pair from the lazy list (pairStream) that is included in the simulation configuration. The stats, including GC, are as follows: 1,220,911,360 bytes allocated in the heap 787,192,920 bytes copied during GC 186,821,752 bytes maximum residency (10 sample(s)) 1,030,400 bytes maximum slop 449 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 2159 colls, 0 par 0.80s 0.81s 0.0004s 0.0283s Gen 1 10 colls, 0 par 0.96s 1.09s 0.1094s 0.4354s INIT time 0.00s ( 0.00s elapsed) MUT time 0.95s ( 0.97s elapsed) GC time 1.76s ( 1.91s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 2.72s ( 2.88s elapsed) %GC time 64.9% (66.2% elapsed) Alloc rate 1,278,074,521 bytes per MUT second Productivity 35.1% of total user, 33.1% of total elapsed And again, I have to bump up the maximum stack size in order to even run the simulation. I know there must be a big thunk building up somewhere.. but I can't figure out where? How can I improve the heap/stack allocation and GC in a problem like this? How can I identify where a thunk may be building up? Is the use of the State monad here misguided? -- UPDATE: I neglected to look over the output of the profiler when compiling with -fprof-auto. Here is the head of that output: COST CENTRE MODULE no. entries %time %alloc %time %alloc MAIN MAIN 58 0 0.0 0.0 100.0 100.0 main Main 117 0 0.0 0.0 100.0 100.0 main.randomList Main 147 1 62.0 55.5 62.0 55.5 main.arr Main 142 1 0.0 0.0 0.0 0.0 streamToAssocList Main 143 1 0.0 0.0 0.0 0.0 streamToAssocList.go Main 146 5 0.0 0.0 0.0 0.0 main.pairList Main 137 1 0.0 0.0 9.5 16.5 consPairStream Main 138 1 0.7 0.9 9.5 16.5 consPairStream.ys Main 140 1 4.3 7.8 4.3 7.8 consPairStream.xs Main 139 1 4.5 7.8 4.5 7.8 main.initConfig Main 122 1 0.0 0.0 0.0 0.0 logLikelihood Main 163 0 0.0 0.0 0.0 0.0 logPrior Main 161 5 0.0 0.0 0.0 0.0 main.sim Main 118 1 1.0 2.2 28.6 28.1 simKernel Main 120 0 4.8 5.1 27.6 25.8 I'm not sure how to interpret this exactly, but the lazy stream of random doubles, randomList, makes me wince. I have no idea how that could be improved. 解决方案 I've updated the hpaste with a working example. It looks like the culprits are: Missing strictness annotations in three SimConfig fields: simArray, logP and logL data SimConfig = SimConfig { numDimensions :: !Int -- strict , numWalkers :: !Int -- strict , simArray :: !(IntMap [Double]) -- strict spine , logP :: !(Seq Double) -- strict spine , logL :: !(Seq Double) -- strict spine , pairStream :: [(Int, Int)] -- lazy , doubleStream :: [Double] -- lazy } deriving Show newConfig was never evaluated in the simKernel loop due to State being lazy. Another alternative would be to use the strict State monad instead. put $! newConfig execState ... replicateM also builds thunks. I originally replaced this with a foldl' and moved the execState into the fold, but I would think swapping in replicateM_ is equivalent and easier to read: let sim = logL $ execState (replicateM_ epochs simKernel) initConfig -- sim = logL $ foldl' (const . execState simKernel) initConfig [1..epochs] And a few calls to mapM .. replicate have been replaced with replicateM. Particularly noteworthy in consPairList where it reduces memory usage quite a bit. There is still room for improvement but the lowest hanging fruit involves unsafeInterleaveST... so I stopped. I have no idea if the output results are what you want: fromList [-4.287033457733427,-1.8000404912760795,-5.581988678626085,-0.9362372340483293,-5.267791907985331] But here are the stats: 268,004,448 bytes allocated in the heap 70,753,952 bytes copied during GC 16,014,224 bytes maximum residency (7 sample(s)) 1,372,456 bytes maximum slop 40 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 490 colls, 0 par 0.05s 0.05s 0.0001s 0.0012s Gen 1 7 colls, 0 par 0.04s 0.05s 0.0076s 0.0209s INIT time 0.00s ( 0.00s elapsed) MUT time 0.12s ( 0.12s elapsed) GC time 0.09s ( 0.10s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 0.21s ( 0.22s elapsed) %GC time 42.2% (45.1% elapsed) Alloc rate 2,241,514,569 bytes per MUT second Productivity 57.8% of total user, 53.7% of total elapsed 这篇关于在模拟中控制内存分配/ GC？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在模拟中控制内存分配/ GC？ [英] Controlling memory allocation/GC in a simulation?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在模拟中控制内存分配/ GC？ [英] Controlling memory allocation/GC in a simulation?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭