在模拟中控制内存分配/ GC? [英] Controlling memory allocation/GC in a simulation?
问题描述
在模拟在 State
monad中运行的模拟内容时,我很难找出如何减少内存使用和GC时间。目前我必须使用 + RTS -K100M
来运行编译后的代码,以避免堆栈空间溢出,并且GC统计信息非常可怕(见下文)。
以下是相关的代码片段。完整的工作(GHC 7.4.1)代码可以在找到。
- 独立代数数据类型保存模拟配置。
data SimConfig = SimConfig {
numDimensions ::!Int - strict
,numWalkers ::!Int - strict
,simArray :: IntMap [Double] - 严格脊椎
,logP :: Seq Double - 严格脊椎
,logL :: Seq Double - 严格脊椎
,pairStream :: [(Int,Int)] - 懒惰(无限)列表随机值
,doubleStream :: [Double] - 随机值的懒惰(无限)列表
}派生Show
- 用于模拟的转换内核。
simKernel :: State SimConfig()
simKernel = do
config< - get
let arr = simArray config
let n = numWalkers config
let d = numDimensions config
let rstm0 = pairStream config
let rstm1 = doubleStream config
let lp = logP config
let ll = logL config
let( a,b)= head rstm0 - 使用随机流
让z0 = head。 map affineTransform $ take 1 rstm1 - 使用随机流
,其中affineTransform a = 0.5 *(a + 1)^ 2
let proposal = zipWith(+)r1 r2
其中r1 = map(* z0)$ fromJust(IntMap.lookup a arr)
r2 = map(*(1-z0))$ fromJust(IntMap.lookup b arr)
让logA = if val> 0 then 0 else val
where val = logP_proposal + logL_proposal - (lp`index`(a - 1)) - (ll`index`(a - 1))+((从整数n - 1)* log z0 )
logP_proposal = logPrior建议
logL_proposal = log可靠性建议
让cVal =(rstm1 !! 1)<= exp logA - 使用随机流
let newConfig = SimConfig {simArray = if cVal
then IntMap.update(\_ - > Just proposal)a arr
else arr
,numWalkers = n
, numDimensions = d
,pairStream = drop 1 rstm0
,doubleStream = drop 2 rstm1
,logP = if cVal
then Seq.update(a - 1)(logPrior proposal)lp
else lp
,logL = if cVal
then Seq.update(a - 1)(logLigelihood proposal)ll
else ll
}
put newConfig
main = do
- (有些东西省略)
let sim = logL $(`execState` initConfig)。 replicateM 100000 $ simKernel
print sim
就堆而言,配置文件似乎提示除(,)
之外, System.Random
函数都是内存的罪魁祸首。我无法直接包含图片,但您可以在此处看到堆配置文件: http://i.imgur .com / 5LKxX.png 。
我不知道如何进一步减少这些东西的存在。随机变量在 State
monad外生成(以避免在每次迭代时分裂生成器),并且我相信(,)
内部 simKernel
出现在从延迟列表中( pairStream
)拔取一对仿真配置。
包括GC在内的统计资料如下:
1,220,911,360在堆中分配的字节
在GC
期间复制的787,192,920字节186,821,752字节最大居民地址(10个样本)
1,030,400字节最大地址
449 MB使用的总内存量(0 MB (经过)平均暂停最大暂停
第0代2159 colls,0 par 0.80s 0.81s 0.0004s 0.0283s
Gen 1 10 colls,0 par 0.80s 0.81s 0.0004s 0.0283s
Gen 1 10 colls, 0面值0.96s 1.09s 0.1094s 0.4354s
初始时间0.00s(经过0.00s)
MUT时间0.95s(经过0.97s)
GC时间1.76s(1.91 s经过)
退出时间0.00s(已过去0.00s)
总时间2.72s(已过时2.88s)
%GC时间64.9%(已过去66.2%)
分配给每个MUT的1,278,074,521字节第二个
生产率35。占用户总数的1%,占总用户总数的33.1%
同样,我必须提高最大值堆栈大小,以便运行模拟。我知道必须在某个地方建立一个庞大的堆栈......但我无法弄清楚在哪里?
如何改善堆栈/堆栈分配和GC像这样的问题?我怎样才能确定thunk可能正在建设的地方?在这里使用 State
monad是否被误导了?
-
UPDATE:
我忽略了使用 -fprof-auto $ c $编译时查看探查器的输出C>。这是输出的头:
成本中心模块no。项目%时间%分配%时间%分配
主要主要58 0 0.0 0.0 100.0 100.0
主要主要117 0 0.0 0.0 100.0 100.0
main.random主要147 1 62.0 55.5 62.0 55.5
main.arr Main 142 1 0.0 0.0 0.0 0.0
streamToAssocList Main 143 1 0.0 0.0 0.0 0.0
streamToAssocList.go Main 146 5 0.0 0.0 0.0 0.0
main.pairList Main 137 1 0.0 0.0 9.5 16.5
consPairStream Main 138 1 0.7 0.9 9.5 16.5
consPairStream.ys Main 140 1 4.3 7.8 4.3 7.8
consPairStream.xs Main 139 1 4.5 7.8 4.5 7.8
main.initConfig Main 122 1 0.0 0.0 0.0 0.0
logLikelihood Main 163 0 0.0 0.0 0.0 0.0
logPrior Main 161 5 0.0 0.0 0.0 0.0
main.sim Main 118 1 1.0 2.2 28.6 28.1
simKernel Main 120 0 4.8 5.1 27.6 25.8
我不确定如何正确解释这个,但随机双打的懒惰流 randomList
,让我变得如此。我不知道如何改进。
我已经用一个工作示例更新了hpaste。它看起来像是罪魁祸首:
- 缺少三个
SimConfig
字段中的严格标注:simArray
,logP
和logL
data SimConfig = SimConfig {
numDimensions ::!Int - strict
,numWalkers ::! Int - strict
,simArray ::!(IntMap [Double]) - 严格脊椎
,logP ::!(Seq Double) - 严格脊椎
,logL ::!( Seq Double) - 严格脊椎
,pairStream :: [(Int,Int)] - lazy
,doubleStream :: [Double] - lazy
}派生Show
<$ c $> $ b $>
newConfig
从未在 simKernel
循环是由于 State
为懒惰。另一种方法是使用严格的 State
monad。
put $! newConfig
execState ... replicateM
也建立thunk。我最初用一个 foldl'
替换了这个,并将 execState
移入了折叠对象,但我认为可以在 replicateM _
是等价的并且更易读:
let sim = logL $ execState replicateM_ epochs simKernel)initConfig
- sim = logL $ foldl'(const。execState simKernel)initConfig [1..epochs]
一些对 mapM .. replicate
的调用已被替换为 replicateM
。在 consPairList
中特别值得注意的是它可以在很大程度上减少内存使用量。还有改进的空间,但最低的挂果要求不安全的InterleaveST ...所以我停了下来。
我不知道输出结果是你想要的: p>
fromList [-4.287033457733427,-1.8000404912760795,-5.581988678626085,-0.9362372340483293,-5.267791907985331]
但这里是统计数据:
在堆中分配的268,004,448字节
复制70,753,952字节在GC
16,014,224字节的最大居民地址(7个样本)
1,372,456字节最大污水处理
40 MB使用的总内存(由于碎片造成的0 MB丢失)
总时间(已用)平均暂停最大暂停
Gen 0 490 colls,0 par 0.05s 0.05s 0.0001s 0.0012s
Gen 1 7 colls,0 par 0.04s 0.05s 0.0076s 0.0209s
INIT时间0.00s(已过去0.00s)
MUT时间0.12s (经过0.12s)
GC时间0.09s(经过0.10s)
EXIT时间0.00s(经过0.00s)
总时间0.21s(经过0.22s)
%GC时间42.2%(已用完45.1%)
分配给每个MUT的2,241,514,569个字节第二个
生产力总用户的57.8%,已用完总数的53.7%
I'm having a bit of trouble figuring out how to reduce memory usage and GC time in a simulation running in the State
monad. Presently I have to run the compiled code with +RTS -K100M
to avoid stack space overflow, and the GC stats are pretty hideous (see below).
Here are relevant snippets of the code. Complete, working (GHC 7.4.1) code can be found at http://hpaste.org/68527.
-- Lone algebraic data type holding the simulation configuration.
data SimConfig = SimConfig {
numDimensions :: !Int -- strict
, numWalkers :: !Int -- strict
, simArray :: IntMap [Double] -- strict spine
, logP :: Seq Double -- strict spine
, logL :: Seq Double -- strict spine
, pairStream :: [(Int, Int)] -- lazy (infinite) list of random vals
, doubleStream :: [Double] -- lazy (infinite) list of random vals
} deriving Show
-- The transition kernel for the simulation.
simKernel :: State SimConfig ()
simKernel = do
config <- get
let arr = simArray config
let n = numWalkers config
let d = numDimensions config
let rstm0 = pairStream config
let rstm1 = doubleStream config
let lp = logP config
let ll = logL config
let (a, b) = head rstm0 -- uses random stream
let z0 = head . map affineTransform $ take 1 rstm1 -- uses random stream
where affineTransform a = 0.5 * (a + 1) ^ 2
let proposal = zipWith (+) r1 r2
where r1 = map (*z0) $ fromJust (IntMap.lookup a arr)
r2 = map (*(1-z0)) $ fromJust (IntMap.lookup b arr)
let logA = if val > 0 then 0 else val
where val = logP_proposal + logL_proposal - (lp `index` (a - 1)) - (ll `index` (a - 1)) + ((fromIntegral n - 1) * log z0)
logP_proposal = logPrior proposal
logL_proposal = logLikelihood proposal
let cVal = (rstm1 !! 1) <= exp logA -- uses random stream
let newConfig = SimConfig { simArray = if cVal
then IntMap.update (\_ -> Just proposal) a arr
else arr
, numWalkers = n
, numDimensions = d
, pairStream = drop 1 rstm0
, doubleStream = drop 2 rstm1
, logP = if cVal
then Seq.update (a - 1) (logPrior proposal) lp
else lp
, logL = if cVal
then Seq.update (a - 1) (logLikelihood proposal) ll
else ll
}
put newConfig
main = do
-- (some stuff omitted)
let sim = logL $ (`execState` initConfig) . replicateM 100000 $ simKernel
print sim
In terms of the heap, a profile seems to cue that the System.Random
functions, in addition to (,)
, are memory culprits. I can't include an image directly, but you can see a heap profile here: http://i.imgur.com/5LKxX.png.
I have no idea how to reduce the presence of those things any further. The random variates are generated outside the State
monad (to avoid splitting the generator on every iteration), and I believe the only instance of (,)
inside simKernel
arises when plucking a pair from the lazy list (pairStream
) that is included in the simulation configuration.
The stats, including GC, are as follows:
1,220,911,360 bytes allocated in the heap
787,192,920 bytes copied during GC
186,821,752 bytes maximum residency (10 sample(s))
1,030,400 bytes maximum slop
449 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 2159 colls, 0 par 0.80s 0.81s 0.0004s 0.0283s
Gen 1 10 colls, 0 par 0.96s 1.09s 0.1094s 0.4354s
INIT time 0.00s ( 0.00s elapsed)
MUT time 0.95s ( 0.97s elapsed)
GC time 1.76s ( 1.91s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 2.72s ( 2.88s elapsed)
%GC time 64.9% (66.2% elapsed)
Alloc rate 1,278,074,521 bytes per MUT second
Productivity 35.1% of total user, 33.1% of total elapsed
And again, I have to bump up the maximum stack size in order to even run the simulation. I know there must be a big thunk building up somewhere.. but I can't figure out where?
How can I improve the heap/stack allocation and GC in a problem like this? How can I identify where a thunk may be building up? Is the use of the State
monad here misguided?
--
UPDATE:
I neglected to look over the output of the profiler when compiling with -fprof-auto
. Here is the head of that output:
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 58 0 0.0 0.0 100.0 100.0
main Main 117 0 0.0 0.0 100.0 100.0
main.randomList Main 147 1 62.0 55.5 62.0 55.5
main.arr Main 142 1 0.0 0.0 0.0 0.0
streamToAssocList Main 143 1 0.0 0.0 0.0 0.0
streamToAssocList.go Main 146 5 0.0 0.0 0.0 0.0
main.pairList Main 137 1 0.0 0.0 9.5 16.5
consPairStream Main 138 1 0.7 0.9 9.5 16.5
consPairStream.ys Main 140 1 4.3 7.8 4.3 7.8
consPairStream.xs Main 139 1 4.5 7.8 4.5 7.8
main.initConfig Main 122 1 0.0 0.0 0.0 0.0
logLikelihood Main 163 0 0.0 0.0 0.0 0.0
logPrior Main 161 5 0.0 0.0 0.0 0.0
main.sim Main 118 1 1.0 2.2 28.6 28.1
simKernel Main 120 0 4.8 5.1 27.6 25.8
I'm not sure how to interpret this exactly, but the lazy stream of random doubles, randomList
, makes me wince. I have no idea how that could be improved.
I've updated the hpaste with a working example. It looks like the culprits are:
- Missing strictness annotations in three
SimConfig
fields:simArray
,logP
andlogL
data SimConfig = SimConfig { numDimensions :: !Int -- strict , numWalkers :: !Int -- strict , simArray :: !(IntMap [Double]) -- strict spine , logP :: !(Seq Double) -- strict spine , logL :: !(Seq Double) -- strict spine , pairStream :: [(Int, Int)] -- lazy , doubleStream :: [Double] -- lazy } deriving Show
newConfig
was never evaluated in thesimKernel
loop due toState
being lazy. Another alternative would be to use the strictState
monad instead.put $! newConfig
execState ... replicateM
also builds thunks. I originally replaced this with afoldl'
and moved theexecState
into the fold, but I would think swapping inreplicateM_
is equivalent and easier to read:let sim = logL $ execState (replicateM_ epochs simKernel) initConfig -- sim = logL $ foldl' (const . execState simKernel) initConfig [1..epochs]
And a few calls to mapM .. replicate
have been replaced with replicateM
. Particularly noteworthy in consPairList
where it reduces memory usage quite a bit. There is still room for improvement but the lowest hanging fruit involves unsafeInterleaveST... so I stopped.
I have no idea if the output results are what you want:
fromList [-4.287033457733427,-1.8000404912760795,-5.581988678626085,-0.9362372340483293,-5.267791907985331]
But here are the stats:
268,004,448 bytes allocated in the heap 70,753,952 bytes copied during GC 16,014,224 bytes maximum residency (7 sample(s)) 1,372,456 bytes maximum slop 40 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 490 colls, 0 par 0.05s 0.05s 0.0001s 0.0012s Gen 1 7 colls, 0 par 0.04s 0.05s 0.0076s 0.0209s INIT time 0.00s ( 0.00s elapsed) MUT time 0.12s ( 0.12s elapsed) GC time 0.09s ( 0.10s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 0.21s ( 0.22s elapsed) %GC time 42.2% (45.1% elapsed) Alloc rate 2,241,514,569 bytes per MUT second Productivity 57.8% of total user, 53.7% of total elapsed
这篇关于在模拟中控制内存分配/ GC?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!