在Haskell中优化数组的性能 [英] Optimizing numerical array performance in Haskell
问题描述
运行噪声函数10次(我需要不同波长的噪声,例如地形高度,温度,树的位置等等),一个块(16×16×128块)中的每个块具有3个八度的噪声,或者总共大约一百万个对噪声功能的调用,大约需要700-800毫秒。尽管事实上在算法中没有明显的昂贵的操作(至少对我来说),但对于产生具有任何体面的速度的地形而言,这至少是一个数量级太慢的速度。只是楼,模,一些数组查找和基本的算术。下面列出了算法(用Haskell编写)。 SCC的意见是为了分析。我已经省略了二维噪声函数,因为它们的工作方式是一样的。
g3 ::(Floating a,RealFrac a) => a
g3 = 1/6
{ - #INLINE int# - }
int ::(积分a,数字b)=> a - > b
int = fromIntegral
grad3 ::(Floating a,RealFrac a)=> V.Vector(a,a,a)
grad3 = V.fromList $ [(1,1,0),( - 1,1,0),(1,-1,0),( - 1 ,-1,0),
(1,0,1),( - 1,0,0),(1,0,-1),( - 1,0,-1),
(0,1,1),(0,-1,1),(0,1,-1),(0,-1,-1)]
{ - #INLINE dot3 # - }
dot3 :: Num a => (a,a,a) - > a - > a - > a - > a
dot3(a,b,c)xyz = a * x + b * y + c * z
{ - #INLINE fastFloor# - }
fastFloor :: RealFrac a => a - > Int
fastFloor x = truncate(如果x> 0则x else x - 1)
- 产生噪声函数中使用的随机置换
perm :: Int - >排列
perm seed = V.fromList。 concat。复制2。 shuffle'[0..255] 256 $ mkStdGen种子
- 产生-0.5到0.5之间的3D噪音
simplex3D ::(Floating a,RealFrac a)=>置换 - > a - > a - > a - >
simplex3D pxyz = { - #SCCout# - } 16 *(n gi0(x0,y0,z0)+ n gi1 xyz1 + n gi2 xyz2 + n gi3 xyz3) (x,y,z)/ 3)
(x0,y0,z0) )= { - #SCCx0-z0# - }(x - int i + t,y - int j + t,z - int k + t)其中t = int(i + j + k)* g3 $如果x0> = y0
,那么如果y0> = z0,则b(b1,b1,j1,k1,i2,j2,k2)= { - #SCCi1-k2 0,0,1,1,0)否则
如果x0> = z0那么(1,0,0,1,0,1)else(0,0,1,1,0,1)
else if y0< z0 then(0,0,1,0,1,1)else
if x0 < z0 then(0,1,0,0,1,1)else(0,1,0,1,1,0)
xyz1 = { - #SCCxyz1# - }(x0 - int i1 + g3,y0-int j1 + g3,z0-int k1 + g3)
xyz2 = { - #SCCxyz2# - }(x0 - int i2 + 2 * g3,y0 - int j2 + 2 * g3,z0-int k2 + 2 * g3)
xyz3 = { - #SCCxyz3# - }(x0-1 + 3 * g3,y0-1 + 3 * g3,z0-1 + 3 * g3)
(ii,jj,kk)= { - #SCCiijjkk# - }(i。& 255,j。& 255,k。& 255)
gi0 = { - #SCCgi0# - } mod(pV!!(ii + pV!!(jj + pV!!kk)))12
gi1 = { - #SCCgi1 ()()()()()()()()()()() - } mod(p V!!(ii + i2 + p V!!(jj + j2 + p V!!(kk + k2))))12
gi3 = { - #SCCgi3# - (p + 1 + pV!!(kk + 1))))12
{ - #INLINE n# - }
n gi(x',y',z')= { - #SCCn# - }(\ a - >> if a <0 then 0 else
a * a * a * a * dot3 grad3 V.!gi)x'y'z')$ 0.6 - x'* x' - y'* y' - z * Z
谐波::(民一个,分数A)=> Int - > (a - > a) - > a
谐波八度噪声= f八度/(2-1 / int(2 ^(octave-1)))其中
f 0 = 0
fo = let r = int $ 2 ^(o - 1)in noise r / r + f(o - 1)
- 介于-0.5和0.5
之间的泛音三维噪声harmonicNoise3D ::(RealFrac a,Floating a)=> ;置换 - > Int - > a - > a - > a - > a - > a
harmonicNoise3D p octaves lxyz =谐波八度音
(\\ * - f - > simplex3D p(x * f / l)(y * f / l)(z * f / l))
为了分析,我使用了下面的代码:
main = do start< - getCurrentTime
print $ q()
end< - getCurrentTime
print $ diffUTCTime end start
产生以下信息:
$ p $ 成本中心模块%时间%分配
simplex3D主要18.8 21.0 $ b $ $主要18.0 19.6
主要10.1 9.2
harmonicNoise3D主要9.8 4.5
主要6.4 5.8
int Main 4.0 2.9
gi3主4.0 3.0
xyz2主3.5 5.9
gi1主要3.4 3.4
gi0主要3.4 2.7
快速主要3.2 0.6
xyz1主要2.9 5.9
ijk主要2.7 3.5
gi2主要2.7 3.3
xyz3主要2.6 4.1
iijjkk Main 1.6 2.5
dot3 Main 1.6 0.7
也将该算法移植到C#。那里的表现快了大概3到4倍,所以我想我一定是做错了。但即使如此,它还不如我想要的那么快。所以我的问题是:任何人都可以告诉我是否有任何方法来加速我的实现和/或一般算法,或者有人知道有一个不同的噪声算法,有更好的性能特点,但有相似的外观?
更新: 按照以下提供的一些建议,代码现在看起来如下: 加上我的块大小减少到8x8x128,现在产生新的地形块的速度大约为10-20 fps,这意味着到处走动现在已经不像以前那样成问题了。当然,任何其他的性能改进仍然是值得欢迎的。 最初突出的是你的代码是高度多态。你应该把你的浮点类型统一到 好的。有很多东西可以用来改善这些代码。 改进 数据表示 编译器标记 更高效的库函数 另外,请检查您的算法是否与C#相同,并且您使用的是相同的数据结构。 I'm working on a terrain generation algorithm for a MineCraft-like world. Currently, I'm using simplex noise based on the implementation in the paper 'Simplex Noise Demystified' [PDF], since simplex noise is supposed to be faster and to have fewer artifacts than Perlin noise. This looks fairly decent (see image), but so far it's also pretty slow. Running the noise function 10 times (I need noise with different wavelengths for things like terrain height, temperature, tree location, etc.) with 3 octaves of noise for each block in a chunk (16x16x128 blocks), or about 1 million calls to the noise function in total, takes about 700-800 ms. This is at least an order of magnitude too slow for the purposes of generating terrain with any decent kind of speed, despite the fact that there are no obvious expensive operations in the algorithm (at least to me). Just floor, modulo, some array lookups and basic arithmetic. The algorithm (written in Haskell) is listed below. The SCC comments are for profiling. I've omitted the 2D noise functions, since they work the same way. For profiling, I used the following code, which produces the following information: To compare, I also ported the algorithm to C#. Performance there was about 3 to 4 times faster, so I imagine I must be doing something wrong. But even then it's not nearly as fast as I would like. So my question is this: can anyone tell me if there are any ways to speed up my implementation and/or the algorithm in general or does anyone know of a different noise algorithm that has better performance characteristics but a similar look? Update: After following some of the suggestions offered below, the code now looks as follows: Together with reducing my chunk size to 8x8x128, generating new terrain chunks now occurs at about 10-20 fps, which means moving around is now not nearly as problematic as before. Of course, any other performance improvements are still welcome. The thing that stands out initially is that your code is highly polymorphic. You should specialize your floating point type uniformly to Note, for those trying to reproduce, this code imports:
Ok. There's lots of things you can try to improve this code. Improvements Data representation Compiler flags More efficient library functions Runtime settings Also, check your algorithm is identical to the C# one, and you're using the same data structures. 这篇关于在Haskell中优化数组的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
模块Noise(Permutation,Perm
,noise3D,simplex3D
)其中
导入Data.Bits
导入限定的Data.Vector.Unboxed作为UV
导入System.Random
导入System.Random.Shuffle
类型Permutation = UV .Vector Int
g3 :: Double
g3 = 1/6
{ - #INLINE int# - }
int :: Int - > ; Double
int = fromIntegral
grad3 :: UV.Vector(Double,Double,Double)
grad3 = UV.fromList $ [(1,1,0),( - (1,0,1),(-1,1,0),( - 1,-1,0),
(1,0,1),( - 1,0,1),(1, 0,-1),( - 1,0,-1),
(0,1,1),(0,-1,1),(0,1,-1),(0, - 1,-1)]
{ - #INLINE dot3# - }
dot3 ::(Double,Double,Double) - >双 - >双 - >双 - > Double
dot3(a,b,c)xyz = a * x + b * y + c * z
{ - #INLINE fastFloor# - }
fastFloor :: Double - > Int
fastFloor x = truncate(如果x> 0则x else x - 1)
- 产生噪声函数中使用的随机置换
perm :: Int - > Permutation
perm seed = UV.fromList。 concat。复制2。 shuffle'[0..255] 256 $ mkStdGen种子
- 产生-0.5到0.5之间的3D噪声
noise3D ::排列 - >双 - >双 - >双 - > ($,$,$)$($,$,$)$($,$,$)$($,$) ,sz)其中sa = fastFloor(a +(x + y + z)/ 3)
(x0,y0,z0)=(x - int i + t,y - int j + t,z - int其中t = int(i + j + k)* g3
(i1,j1,k1,i2,j2,k2)= if x0> = y0
then if y0> = z0则(1,0,0,1,1,0)else
如果x0> = z0则(1,0,0,1,0,1)else(0,0,1,1 ,0,1)
else if y0< z0 then(0,0,1,0,1,1)else
if x0 < z0 then(0,1,0,0,1,1)else(0,1,0,1,1,0)
xyz1 =(x0 - int i1 + g3,y0 - int j1 + g3, z0-int k1 + g3)
xyz2 =(x0-int i2 + 2 * g3,y0 -int j2 + 2 * g3,z0 -int k2 + 2 * g3)
xyz3 =(x0- 1 + 3 * g3,y0-1 + 3 * g3,z0-1 + 3 * g3)
(ii,jj,kk)=(i& 255,j。& 255,k (& 255)
gi0 = rem(UV.unsafeIndex p(ii + UV.unsafeIndex p(jj + UV.unsafeIndex pkk)))12
gi1 = rem(UV.unsafeIndex p ii + i1 + UV.unsafeIndex p(jj + j1 + UV.unsafeIndex p(kk + k1))))12
gi2 = rem(UV.unsafeIndex p(ii + i2 + UV.unsafeIndex p(jj + (UV + UV.unsafeIndex p(kk + k2))))12
gi3 = rem(UV.unsafeIndex p(ii + 1 + UV.unsafeIndex p(jj + 1 + UV.unsafeIndex p(kk + 1) ))12
{ - #INLINE n# - }
n gi(x',y',z')=(\ a - >如果a <0则0 else
a * a * a * a * dot3(UV.unsafeIndex grad3 gi)x'y'z')$ 0.6 - x'* x' - y'* y' - z'* z'
harmo nic :: Int - > (Double - > Double) - >双
谐波八度噪声= f八度/(2-1 / int(2 ^(octave-1)))其中
f 0 = 0
fo = let r = 2 ^^(o - 1)in noise r / r + f(o-1)
--3D simplex noise
--syntax:simplex3D permutation number_of_octaves wavelength xyz
simplex3D :: Permutation - > ; Int - >双 - >双 - >双 - >双 - > Double
simplex3D p octaves lxyz =和音八度音
(\\ * f / l)noise3D p(x * f / l)(y * f / l)(z * f / l))
Double
,所以GHC(和LLVM)有机会应用更积极的优化。
$ b $注意,对于那些试图复制的代码,这个代码会导入:
将限定的Data.Vector导入为V
import Data.Bits
import Data.Time.Clock
import System.Random
import System.Random.Shuffle
type Permutation = V.Vector Int
(a,a,a)
替换为unboxed triple T!Double!Double!Double
Data.Array
切换为 Data.Array.Unboxed
置换
repa
中的多维unboxed数组替换三元组的盒装数组>包
-O2 -fvia-C -optc-O3 -fexcess-precision -optc-march =本地
(或等同于 -fllvm
)
-fspec-constr-count = 16
mod
c> rem VU替换
(移至 V。!
索引。 unsafeIndex Data.Vector.Unboxed
运行时设置
-A20M code>或
-H
g3 :: (Floating a, RealFrac a) => a
g3 = 1/6
{-# INLINE int #-}
int :: (Integral a, Num b) => a -> b
int = fromIntegral
grad3 :: (Floating a, RealFrac a) => V.Vector (a,a,a)
grad3 = V.fromList $ [(1,1,0),(-1, 1,0),(1,-1, 0),(-1,-1, 0),
(1,0,1),(-1, 0,1),(1, 0,-1),(-1, 0,-1),
(0,1,1),( 0,-1,1),(0, 1,-1),( 0,-1,-1)]
{-# INLINE dot3 #-}
dot3 :: Num a => (a, a, a) -> a -> a -> a -> a
dot3 (a,b,c) x y z = a * x + b * y + c * z
{-# INLINE fastFloor #-}
fastFloor :: RealFrac a => a -> Int
fastFloor x = truncate (if x > 0 then x else x - 1)
--Generate a random permutation for use in the noise functions
perm :: Int -> Permutation
perm seed = V.fromList . concat . replicate 2 . shuffle' [0..255] 256 $ mkStdGen seed
--Generate 3D noise between -0.5 and 0.5
simplex3D :: (Floating a, RealFrac a) => Permutation -> a -> a -> a -> a
simplex3D p x y z = {-# SCC "out" #-} 16 * (n gi0 (x0,y0,z0) + n gi1 xyz1 + n gi2 xyz2 + n gi3 xyz3) where
(i,j,k) = {-# SCC "ijk" #-} (s x, s y, s z) where s a = fastFloor (a + (x + y + z) / 3)
(x0,y0,z0) = {-# SCC "x0-z0" #-} (x - int i + t, y - int j + t, z - int k + t) where t = int (i + j + k) * g3
(i1,j1,k1,i2,j2,k2) = {-# SCC "i1-k2" #-} if x0 >= y0
then if y0 >= z0 then (1,0,0,1,1,0) else
if x0 >= z0 then (1,0,0,1,0,1) else (0,0,1,1,0,1)
else if y0 < z0 then (0,0,1,0,1,1) else
if x0 < z0 then (0,1,0,0,1,1) else (0,1,0,1,1,0)
xyz1 = {-# SCC "xyz1" #-} (x0 - int i1 + g3, y0 - int j1 + g3, z0 - int k1 + g3)
xyz2 = {-# SCC "xyz2" #-} (x0 - int i2 + 2*g3, y0 - int j2 + 2*g3, z0 - int k2 + 2*g3)
xyz3 = {-# SCC "xyz3" #-} (x0 - 1 + 3*g3, y0 - 1 + 3*g3, z0 - 1 + 3*g3)
(ii,jj,kk) = {-# SCC "iijjkk" #-} (i .&. 255, j .&. 255, k .&. 255)
gi0 = {-# SCC "gi0" #-} mod (p V.! (ii + p V.! (jj + p V.! kk ))) 12
gi1 = {-# SCC "gi1" #-} mod (p V.! (ii + i1 + p V.! (jj + j1 + p V.! (kk + k1)))) 12
gi2 = {-# SCC "gi2" #-} mod (p V.! (ii + i2 + p V.! (jj + j2 + p V.! (kk + k2)))) 12
gi3 = {-# SCC "gi3" #-} mod (p V.! (ii + 1 + p V.! (jj + 1 + p V.! (kk + 1 )))) 12
{-# INLINE n #-}
n gi (x',y',z') = {-# SCC "n" #-} (\a -> if a < 0 then 0 else
a*a*a*a*dot3 (grad3 V.! gi) x' y' z') $ 0.6 - x'*x' - y'*y' - z'*z'
harmonic :: (Num a, Fractional a) => Int -> (a -> a) -> a
harmonic octaves noise = f octaves / (2 - 1 / int (2 ^ (octaves - 1))) where
f 0 = 0
f o = let r = int $ 2 ^ (o - 1) in noise r / r + f (o - 1)
--Generate harmonic 3D noise between -0.5 and 0.5
harmonicNoise3D :: (RealFrac a, Floating a) => Permutation -> Int -> a -> a -> a -> a -> a
harmonicNoise3D p octaves l x y z = harmonic octaves
(\f -> simplex3D p (x * f / l) (y * f / l) (z * f / l))
q _ = let p = perm 0 in
sum [harmonicNoise3D p 3 l x y z :: Float | l <- [1..10], y <- [0..127], x <- [0..15], z <- [0..15]]
main = do start <- getCurrentTime
print $ q ()
end <- getCurrentTime
print $ diffUTCTime end start
COST CENTRE MODULE %time %alloc
simplex3D Main 18.8 21.0
n Main 18.0 19.6
out Main 10.1 9.2
harmonicNoise3D Main 9.8 4.5
harmonic Main 6.4 5.8
int Main 4.0 2.9
gi3 Main 4.0 3.0
xyz2 Main 3.5 5.9
gi1 Main 3.4 3.4
gi0 Main 3.4 2.7
fastFloor Main 3.2 0.6
xyz1 Main 2.9 5.9
ijk Main 2.7 3.5
gi2 Main 2.7 3.3
xyz3 Main 2.6 4.1
iijjkk Main 1.6 2.5
dot3 Main 1.6 0.7
module Noise ( Permutation, perm
, noise3D, simplex3D
) where
import Data.Bits
import qualified Data.Vector.Unboxed as UV
import System.Random
import System.Random.Shuffle
type Permutation = UV.Vector Int
g3 :: Double
g3 = 1/6
{-# INLINE int #-}
int :: Int -> Double
int = fromIntegral
grad3 :: UV.Vector (Double, Double, Double)
grad3 = UV.fromList $ [(1,1,0),(-1, 1,0),(1,-1, 0),(-1,-1, 0),
(1,0,1),(-1, 0,1),(1, 0,-1),(-1, 0,-1),
(0,1,1),( 0,-1,1),(0, 1,-1),( 0,-1,-1)]
{-# INLINE dot3 #-}
dot3 :: (Double, Double, Double) -> Double -> Double -> Double -> Double
dot3 (a,b,c) x y z = a * x + b * y + c * z
{-# INLINE fastFloor #-}
fastFloor :: Double -> Int
fastFloor x = truncate (if x > 0 then x else x - 1)
--Generate a random permutation for use in the noise functions
perm :: Int -> Permutation
perm seed = UV.fromList . concat . replicate 2 . shuffle' [0..255] 256 $ mkStdGen seed
--Generate 3D noise between -0.5 and 0.5
noise3D :: Permutation -> Double -> Double -> Double -> Double
noise3D p x y z = 16 * (n gi0 (x0,y0,z0) + n gi1 xyz1 + n gi2 xyz2 + n gi3 xyz3) where
(i,j,k) = (s x, s y, s z) where s a = fastFloor (a + (x + y + z) / 3)
(x0,y0,z0) = (x - int i + t, y - int j + t, z - int k + t) where t = int (i + j + k) * g3
(i1,j1,k1,i2,j2,k2) = if x0 >= y0
then if y0 >= z0 then (1,0,0,1,1,0) else
if x0 >= z0 then (1,0,0,1,0,1) else (0,0,1,1,0,1)
else if y0 < z0 then (0,0,1,0,1,1) else
if x0 < z0 then (0,1,0,0,1,1) else (0,1,0,1,1,0)
xyz1 = (x0 - int i1 + g3, y0 - int j1 + g3, z0 - int k1 + g3)
xyz2 = (x0 - int i2 + 2*g3, y0 - int j2 + 2*g3, z0 - int k2 + 2*g3)
xyz3 = (x0 - 1 + 3*g3, y0 - 1 + 3*g3, z0 - 1 + 3*g3)
(ii,jj,kk) = (i .&. 255, j .&. 255, k .&. 255)
gi0 = rem (UV.unsafeIndex p (ii + UV.unsafeIndex p (jj + UV.unsafeIndex p kk ))) 12
gi1 = rem (UV.unsafeIndex p (ii + i1 + UV.unsafeIndex p (jj + j1 + UV.unsafeIndex p (kk + k1)))) 12
gi2 = rem (UV.unsafeIndex p (ii + i2 + UV.unsafeIndex p (jj + j2 + UV.unsafeIndex p (kk + k2)))) 12
gi3 = rem (UV.unsafeIndex p (ii + 1 + UV.unsafeIndex p (jj + 1 + UV.unsafeIndex p (kk + 1 )))) 12
{-# INLINE n #-}
n gi (x',y',z') = (\a -> if a < 0 then 0 else
a*a*a*a*dot3 (UV.unsafeIndex grad3 gi) x' y' z') $ 0.6 - x'*x' - y'*y' - z'*z'
harmonic :: Int -> (Double -> Double) -> Double
harmonic octaves noise = f octaves / (2 - 1 / int (2 ^ (octaves - 1))) where
f 0 = 0
f o = let r = 2 ^^ (o - 1) in noise r / r + f (o - 1)
--3D simplex noise
--syntax: simplex3D permutation number_of_octaves wavelength x y z
simplex3D :: Permutation -> Int -> Double -> Double -> Double -> Double -> Double
simplex3D p octaves l x y z = harmonic octaves
(\f -> noise3D p (x * f / l) (y * f / l) (z * f / l))
Double
, so GHC (and LLVM) have a chance of applying more aggressive optimizations.import qualified Data.Vector as V
import Data.Bits
import Data.Time.Clock
import System.Random
import System.Random.Shuffle
type Permutation = V.Vector Int
(a,a,a)
with unboxed triple T !Double !Double !Double
Data.Array
to Data.Array.Unboxed
for Permutations
repa
package
-O2 -fvia-C -optc-O3 -fexcess-precision -optc-march=native
(or equivalent with -fllvm
)-fspec-constr-count=16
mod
with rem
V.!
indexing with unchecked indexing VU.unsafeIndex
(after moving to Data.Vector.Unboxed
-A20M
or -H