在Haskell中优化数组的性能 [英] Optimizing numerical array performance in Haskell

查看:189
本文介绍了在Haskell中优化数组的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究类似MineCraft的世界的地形生成算法。目前,我正在使用单纯的噪音,基于在这篇文章中的实现单纯噪音揭秘[PDF] ,因为单纯的噪声应该比Perlin噪声更快,并且产生更少的伪像。这看起来相当不错(见图片),但到目前为止它也很慢。




运行噪声函数10次(我需要不同波长的噪声,例如地形高度,温度,树的位置等等),一个块(16×16×128块)中的每个块具有3个八度的噪声,或者总共大约一百万个对噪声功能的调用,大约需要700-800毫秒。尽管事实上在算法中没有明显的昂贵的操作(至少对我来说),但对于产生具有任何体面的速度的地形而言,这至少是一个数量级太慢的速度。只是楼,模,一些数组查找和基本的算术。下面列出了算法(用Haskell编写)。 SCC的意见是为了分析。我已经省略了二维噪声函数,因为它们的工作方式是一样的。

  g3 ::(Floating a,RealFrac a) => a 
g3 = 1/6

{ - #INLINE int# - }
int ::(积分a,数字b)=> a - > b
int = fromIntegral

grad3 ::(Floating a,RealFrac a)=> V.Vector(a,a,a)
grad3 = V.fromList $ [(1,1,0),( - 1,1,0),(1,-1,0),( - 1 ,-1,0),
(1,0,1),( - 1,0,0),(1,0,-1),( - 1,0,-1),
(0,1,1),(0,-1,1),(0,1,-1),(0,-1,-1)]

{ - #INLINE dot3 # - }
dot3 :: Num a => (a,a,a) - > a - > a - > a - > a
dot3(a,b,c)xyz = a * x + b * y + c * z

{ - #INLINE fastFloor# - }
fastFloor :: RealFrac a => a - > Int
fastFloor x = truncate(如果x> 0则x else x - 1)

- 产生噪声函数中使用的随机置换
perm :: Int - >排列
perm seed = V.fromList。 concat。复制2。 shuffle'[0..255] 256 $ mkStdGen种子

- 产生-0.5到0.5之间的3D噪音
simplex3D ::(Floating a,RealFrac a)=>置换 - > a - > a - > a - >
simplex3D pxyz = { - #SCCout# - } 16 *(n gi0(x0,y0,z0)+ n gi1 xyz1 + n gi2 xyz2 + n gi3 xyz3) (x,y,z)/ 3)
(x0,y0,z0) )= { - #SCCx0-z0# - }(x - int i + t,y - int j + t,z - int k + t)其中t = int(i + j + k)* g3 $如果x0> = y0
,那么如果y0> = z0,则b(b1,b1,j1,k1,i2,j2,k2)= { - #SCCi1-k2 0,0,1,1,0)否则
如果x0> = z0那么(1,0,0,1,0,1)else(0,0,1,1,0,1)
else if y0< z0 then(0,0,1,0,1,1)else
if x0 < z0 then(0,1,0,0,1,1)else(0,1,0,1,1,0)
xyz1 = { - #SCCxyz1# - }(x0 - int i1 + g3,y0-int j1 + g3,z0-int k1 + g3)
xyz2 = { - #SCCxyz2# - }(x0 - int i2 + 2 * g3,y0 - int j2 + 2 * g3,z0-int k2 + 2 * g3)
xyz3 = { - #SCCxyz3# - }(x0-1 + 3 * g3,y0-1 + 3 * g3,z0-1 + 3 * g3)
(ii,jj,kk)= { - #SCCiijjkk# - }(i。& 255,j。& 255,k。& 255)
gi0 = { - #SCCgi0# - } mod(pV!!(ii + pV!!(jj + pV!!kk)))12
gi1 = { - #SCCgi1 ()()()()()()()()()()() - } mod(p V!!(ii + i2 + p V!!(jj + j2 + p V!!(kk + k2))))12
gi3 = { - #SCCgi3# - (p + 1 + pV!!(kk + 1))))12
{ - #INLINE n# - }
n gi(x',y',z')= { - #SCCn# - }(\ a - >> if a <0 then 0 else
a * a * a * a * dot3 grad3 V.!gi)x'y'z')$ 0.6 - x'* x' - y'* y' - z * Z

谐波::(民一个,分数A)=> Int - > (a - > a) - > a
谐波八度噪声= f八度/(2-1 / int(2 ^(octave-1)))其中
f 0 = 0
fo = let r = int $ 2 ^(o - 1)in noise r / r + f(o - 1)

- 介于-0.5和0.5
之间的泛音三维噪声harmonicNoise3D ::(RealFrac a,Floating a)=> ;置换 - > Int - > a - > a - > a - > a - > a
harmonicNoise3D p octaves lxyz =谐波八度音
(\\ * - f - > simplex3D p(x * f / l)(y * f / l)(z * f / l))

为了分析,我使用了下面的代码:

 
main = do start< - getCurrentTime
print $ q()
end< - getCurrentTime
print $ diffUTCTime end start



产生以下信息:

$ p $ 成本中心模块%时间%分配

simplex3D主要18.8 21.0 $ b $ $主要18.0 19.6
主要10.1 9.2
harmonicNoise3D主要9.8 4.5
主要6.4 5.8
int Main 4.0 2.9
gi3主4.0 3.0
xyz2主3.5 5.9
gi1主要3.4 3.4
gi0主要3.4 2.7
快速主要3.2 0.6
xyz1主要2.9 5.9
ijk主要2.7 3.5
gi2主要2.7 3.3
xyz3主要2.6 4.1
iijjkk Main 1.6 2.5
dot3 Main 1.6 0.7

也将该算法移植到C#。那里的表现快了大概3到4倍,所以我想我一定是做错了。但即使如此,它还不如我想要的那么快。所以我的问题是:任何人都可以告诉我是否有任何方法来加速我的实现和/或一般算法,或者有人知道有一个不同的噪声算法,有更好的性能特点,但有相似的外观?



更新:

按照以下提供的一些建议,代码现在看起来如下:

 模块Noise(Permutation,Perm 
,noise3D,simplex3D
)其中

导入Data.Bits
导入限定的Data.Vector.Unboxed作为UV
导入System.Random
导入System.Random.Shuffle

类型Permutation = UV .Vector Int

g3 :: Double
g3 = 1/6

{ - #INLINE int# - }
int :: Int - > ; Double
int = fromIntegral

grad3 :: UV.Vector(Double,Double,Double)
grad3 = UV.fromList $ [(1,1,0),( - (1,0,1),(-1,1,0),( - 1,-1,0),
(1,0,1),( - 1,0,1),(1, 0,-1),( - 1,0,-1),
(0,1,1),(0,-1,1),(0,1,-1),(0, - 1,-1)]

{ - #INLINE dot3# - }
dot3 ::(Double,Double,Double) - >双 - >双 - >双 - > Double
dot3(a,b,c)xyz = a * x + b * y + c * z

{ - #INLINE fastFloor# - }
fastFloor :: Double - > Int
fastFloor x = truncate(如果x> 0则x else x - 1)

- 产生噪声函数中使用的随机置换
perm :: Int - > Permutation
perm seed = UV.fromList。 concat。复制2。 shuffle'[0..255] 256 $ mkStdGen种子

- 产生-0.5到0.5之间的3D噪声
noise3D ::排列 - >双 - >双 - >双 - > ($,$,$)$($,$,$)$($,$,$)$($,$) ,sz)其中sa = fastFloor(a +(x + y + z)/ 3)
(x0,y0,z0)=(x - int i + t,y - int j + t,z - int其中t = int(i + j + k)* g3
(i1,j1,k1,i2,j2,k2)= if x0> = y0
then if y0> = z0则(1,0,0,1,1,0)else
如果x0> = z0则(1,0,0,1,0,1)else(0,0,1,1 ,0,1)
else if y0< z0 then(0,0,1,0,1,1)else
if x0 < z0 then(0,1,0,0,1,1)else(0,1,0,1,1,0)
xyz1 =(x0 - int i1 + g3,y0 - int j1 + g3, z0-int k1 + g3)
xyz2 =(x0-int i2 + 2 * g3,y0 -int j2 + 2 * g3,z0 -int k2 + 2 * g3)
xyz3 =(x0- 1 + 3 * g3,y0-1 + 3 * g3,z0-1 + 3 * g3)
(ii,jj,kk)=(i& 255,j。& 255,k (& 255)
gi0 = rem(UV.unsafeIndex p(ii + UV.unsafeIndex p(jj + UV.unsafeIndex pkk)))12
gi1 = rem(UV.unsafeIndex p ii + i1 + UV.unsafeIndex p(jj + j1 + UV.unsafeIndex p(kk + k1))))12
gi2 = rem(UV.unsafeIndex p(ii + i2 + UV.unsafeIndex p(jj + (UV + UV.unsafeIndex p(kk + k2))))12
gi3 = rem(UV.unsafeIndex p(ii + 1 + UV.unsafeIndex p(jj + 1 + UV.unsafeIndex p(kk + 1) ))12
{ - #INLINE n# - }
n gi(x',y',z')=(\ a - >如果a <0则0 else
a * a * a * a * dot3(UV.unsafeIndex grad3 gi)x'y'z')$ 0.6 - x'* x' - y'* y' - z'* z'

harmo nic :: Int - > (Double - > Double) - >双
谐波八度噪声= f八度/(2-1 / int(2 ^(octave-1)))其中
f 0 = 0
fo = let r = 2 ^^(o - 1)in noise r / r + f(o-1)

--3D simplex noise
--syntax:simplex3D permutation number_of_octaves wavelength xyz
simplex3D :: Permutation - > ; Int - >双 - >双 - >双 - >双 - > Double
simplex3D p octaves lxyz =和音八度音
(\\ * f / l)noise3D p(x * f / l)(y * f / l)(z * f / l))

加上我的块大小减少到8x8x128,现在产生新的地形块的速度大约为10-20 fps,这意味着到处走动现在已经不像以前那样成问题了。当然,任何其他的性能改进仍然是值得欢迎的。

解决方案

最初突出的是你的代码是高度多态。你应该把你的浮点类型统一到 Double ,所以GHC(和LLVM)有机会应用更积极的优化。
$ b $注意,对于那些试图复制的代码,这个代码会导入:

 将限定的Data.Vector导入为V 
import Data.Bits
import Data.Time.Clock
import System.Random
import System.Random.Shuffle

type Permutation = V.Vector Int






好的。有很多东西可以用来改善这些代码。



改进

数据表示


  • 专用于具体的浮点类型,而不是多态的浮点函数b $ b
  • 将元组(a,a,a)替换为unboxed triple T!Double!Double!Double

  • Data.Array 切换为 Data.Array.Unboxed 置换

  • repa 中的多维unboxed数组替换三元组的盒装数组>包



编译器标记


  • 编译 -O2 -fvia-C -optc-O3 -fexcess-precision -optc-march =本地(或等同于 -fllvm

  • 增加spec constr阈值 - -fspec-constr-count = 16



更高效的库函数


  • 使用mersenne-random来代替StdGen来生成随机数。
  • 用<$ c $替换 mod c> rem
  • 用$ unchecked indexing VU替换 V。!索引。 unsafeIndex (移至 Data.Vector.Unboxed


运行时设置


  • 增加默认分配区域: -A20M code>或 -H






另外,请检查您的算法是否与C#相同,并且您使用的是相同的数据结构。


I'm working on a terrain generation algorithm for a MineCraft-like world. Currently, I'm using simplex noise based on the implementation in the paper 'Simplex Noise Demystified' [PDF], since simplex noise is supposed to be faster and to have fewer artifacts than Perlin noise. This looks fairly decent (see image), but so far it's also pretty slow.

Running the noise function 10 times (I need noise with different wavelengths for things like terrain height, temperature, tree location, etc.) with 3 octaves of noise for each block in a chunk (16x16x128 blocks), or about 1 million calls to the noise function in total, takes about 700-800 ms. This is at least an order of magnitude too slow for the purposes of generating terrain with any decent kind of speed, despite the fact that there are no obvious expensive operations in the algorithm (at least to me). Just floor, modulo, some array lookups and basic arithmetic. The algorithm (written in Haskell) is listed below. The SCC comments are for profiling. I've omitted the 2D noise functions, since they work the same way.

g3 :: (Floating a, RealFrac a) => a
g3 = 1/6

{-# INLINE int #-}
int :: (Integral a, Num b) => a -> b
int = fromIntegral

grad3 :: (Floating a, RealFrac a) => V.Vector (a,a,a)
grad3 = V.fromList $ [(1,1,0),(-1, 1,0),(1,-1, 0),(-1,-1, 0),
                     (1,0,1),(-1, 0,1),(1, 0,-1),(-1, 0,-1),
                     (0,1,1),( 0,-1,1),(0, 1,-1),( 0,-1,-1)]

{-# INLINE dot3 #-}
dot3 :: Num a => (a, a, a) -> a -> a -> a -> a
dot3 (a,b,c) x y z = a * x + b * y + c * z

{-# INLINE fastFloor #-}
fastFloor :: RealFrac a => a -> Int
fastFloor x = truncate (if x > 0 then x else x - 1)

--Generate a random permutation for use in the noise functions
perm :: Int -> Permutation
perm seed = V.fromList . concat . replicate 2 . shuffle' [0..255] 256 $ mkStdGen seed

--Generate 3D noise between -0.5 and 0.5
simplex3D :: (Floating a, RealFrac a) => Permutation -> a -> a -> a -> a
simplex3D p x y z = {-# SCC "out" #-} 16 * (n gi0 (x0,y0,z0) + n gi1 xyz1 + n gi2 xyz2 + n gi3 xyz3) where
    (i,j,k) = {-# SCC "ijk" #-} (s x, s y, s z) where s a = fastFloor (a + (x + y + z) / 3)
    (x0,y0,z0) = {-# SCC "x0-z0" #-} (x - int i + t, y - int j + t, z - int k + t) where t = int (i + j + k) * g3
    (i1,j1,k1,i2,j2,k2) = {-# SCC "i1-k2" #-} if x0 >= y0
        then if y0 >= z0 then (1,0,0,1,1,0) else
             if x0 >= z0 then (1,0,0,1,0,1) else (0,0,1,1,0,1)
        else if y0 <  z0 then (0,0,1,0,1,1) else
             if x0 <  z0 then (0,1,0,0,1,1) else (0,1,0,1,1,0)
    xyz1 = {-# SCC "xyz1" #-} (x0 - int i1 +   g3, y0 - int j1 +   g3, z0 - int k1 +   g3)
    xyz2 = {-# SCC "xyz2" #-} (x0 - int i2 + 2*g3, y0 - int j2 + 2*g3, z0 - int k2 + 2*g3)
    xyz3 = {-# SCC "xyz3" #-} (x0 - 1      + 3*g3, y0 - 1      + 3*g3, z0 - 1      + 3*g3)
    (ii,jj,kk) = {-# SCC "iijjkk" #-} (i .&. 255, j .&. 255, k .&. 255)
    gi0 = {-# SCC "gi0" #-} mod (p V.! (ii +      p V.! (jj +      p V.!  kk      ))) 12
    gi1 = {-# SCC "gi1" #-} mod (p V.! (ii + i1 + p V.! (jj + j1 + p V.! (kk + k1)))) 12
    gi2 = {-# SCC "gi2" #-} mod (p V.! (ii + i2 + p V.! (jj + j2 + p V.! (kk + k2)))) 12
    gi3 = {-# SCC "gi3" #-} mod (p V.! (ii + 1  + p V.! (jj + 1  + p V.! (kk + 1 )))) 12
    {-# INLINE n #-}
    n gi (x',y',z') = {-# SCC "n" #-} (\a -> if a < 0 then 0 else
        a*a*a*a*dot3 (grad3 V.! gi) x' y' z') $ 0.6 - x'*x' - y'*y' - z'*z'

harmonic :: (Num a, Fractional a) => Int -> (a -> a) -> a
harmonic octaves noise = f octaves / (2 - 1 / int (2 ^ (octaves - 1))) where
    f 0 = 0
    f o = let r = int $ 2 ^ (o - 1) in noise r / r + f (o - 1)

--Generate harmonic 3D noise between -0.5 and 0.5
harmonicNoise3D :: (RealFrac a, Floating a) => Permutation -> Int -> a -> a -> a -> a -> a
harmonicNoise3D p octaves l x y z = harmonic octaves
    (\f -> simplex3D p (x * f / l) (y * f / l) (z * f / l))

For profiling, I used the following code,

q _ = let p = perm 0 in
      sum [harmonicNoise3D p 3 l x y z :: Float | l <- [1..10], y <- [0..127], x <- [0..15], z <- [0..15]]

main = do start <- getCurrentTime
          print $ q ()
          end <- getCurrentTime
          print $ diffUTCTime end start

which produces the following information:

COST CENTRE                    MODULE               %time %alloc

simplex3D                      Main                  18.8   21.0
n                              Main                  18.0   19.6
out                            Main                  10.1    9.2
harmonicNoise3D                Main                   9.8    4.5
harmonic                       Main                   6.4    5.8
int                            Main                   4.0    2.9
gi3                            Main                   4.0    3.0
xyz2                           Main                   3.5    5.9
gi1                            Main                   3.4    3.4
gi0                            Main                   3.4    2.7
fastFloor                      Main                   3.2    0.6
xyz1                           Main                   2.9    5.9
ijk                            Main                   2.7    3.5
gi2                            Main                   2.7    3.3
xyz3                           Main                   2.6    4.1
iijjkk                         Main                   1.6    2.5
dot3                           Main                   1.6    0.7

To compare, I also ported the algorithm to C#. Performance there was about 3 to 4 times faster, so I imagine I must be doing something wrong. But even then it's not nearly as fast as I would like. So my question is this: can anyone tell me if there are any ways to speed up my implementation and/or the algorithm in general or does anyone know of a different noise algorithm that has better performance characteristics but a similar look?

Update:

After following some of the suggestions offered below, the code now looks as follows:

module Noise ( Permutation, perm
             , noise3D, simplex3D
             ) where

import Data.Bits
import qualified Data.Vector.Unboxed as UV
import System.Random
import System.Random.Shuffle

type Permutation = UV.Vector Int

g3 :: Double
g3 = 1/6

{-# INLINE int #-}
int :: Int -> Double
int = fromIntegral

grad3 :: UV.Vector (Double, Double, Double)
grad3 = UV.fromList $ [(1,1,0),(-1, 1,0),(1,-1, 0),(-1,-1, 0),
                     (1,0,1),(-1, 0,1),(1, 0,-1),(-1, 0,-1),
                     (0,1,1),( 0,-1,1),(0, 1,-1),( 0,-1,-1)]

{-# INLINE dot3 #-}
dot3 :: (Double, Double, Double) -> Double -> Double -> Double -> Double
dot3 (a,b,c) x y z = a * x + b * y + c * z

{-# INLINE fastFloor #-}
fastFloor :: Double -> Int
fastFloor x = truncate (if x > 0 then x else x - 1)

--Generate a random permutation for use in the noise functions
perm :: Int -> Permutation
perm seed = UV.fromList . concat . replicate 2 . shuffle' [0..255] 256 $ mkStdGen seed

--Generate 3D noise between -0.5 and 0.5
noise3D :: Permutation -> Double -> Double -> Double -> Double
noise3D p x y z = 16 * (n gi0 (x0,y0,z0) + n gi1 xyz1 + n gi2 xyz2 + n gi3 xyz3) where
    (i,j,k) = (s x, s y, s z) where s a = fastFloor (a + (x + y + z) / 3)
    (x0,y0,z0) = (x - int i + t, y - int j + t, z - int k + t) where t = int (i + j + k) * g3
    (i1,j1,k1,i2,j2,k2) = if x0 >= y0
        then if y0 >= z0 then (1,0,0,1,1,0) else
             if x0 >= z0 then (1,0,0,1,0,1) else (0,0,1,1,0,1)
        else if y0 <  z0 then (0,0,1,0,1,1) else
             if x0 <  z0 then (0,1,0,0,1,1) else (0,1,0,1,1,0)
    xyz1 = (x0 - int i1 +   g3, y0 - int j1 +   g3, z0 - int k1 +   g3)
    xyz2 = (x0 - int i2 + 2*g3, y0 - int j2 + 2*g3, z0 - int k2 + 2*g3)
    xyz3 = (x0 - 1      + 3*g3, y0 - 1      + 3*g3, z0 - 1      + 3*g3)
    (ii,jj,kk) = (i .&. 255, j .&. 255, k .&. 255)
    gi0 = rem (UV.unsafeIndex p (ii +      UV.unsafeIndex p (jj +      UV.unsafeIndex p  kk      ))) 12
    gi1 = rem (UV.unsafeIndex p (ii + i1 + UV.unsafeIndex p (jj + j1 + UV.unsafeIndex p (kk + k1)))) 12
    gi2 = rem (UV.unsafeIndex p (ii + i2 + UV.unsafeIndex p (jj + j2 + UV.unsafeIndex p (kk + k2)))) 12
    gi3 = rem (UV.unsafeIndex p (ii + 1  + UV.unsafeIndex p (jj + 1  + UV.unsafeIndex p (kk + 1 )))) 12
    {-# INLINE n #-}
    n gi (x',y',z') = (\a -> if a < 0 then 0 else
        a*a*a*a*dot3 (UV.unsafeIndex grad3 gi) x' y' z') $ 0.6 - x'*x' - y'*y' - z'*z'

harmonic :: Int -> (Double -> Double) -> Double
harmonic octaves noise = f octaves / (2 - 1 / int (2 ^ (octaves - 1))) where
    f 0 = 0
    f o = let r = 2 ^^ (o - 1) in noise r / r + f (o - 1)

--3D simplex noise
--syntax: simplex3D permutation number_of_octaves wavelength x y z
simplex3D :: Permutation -> Int -> Double -> Double -> Double -> Double -> Double
simplex3D p octaves l x y z = harmonic octaves
    (\f -> noise3D p (x * f / l) (y * f / l) (z * f / l))

Together with reducing my chunk size to 8x8x128, generating new terrain chunks now occurs at about 10-20 fps, which means moving around is now not nearly as problematic as before. Of course, any other performance improvements are still welcome.

解决方案

The thing that stands out initially is that your code is highly polymorphic. You should specialize your floating point type uniformly to Double, so GHC (and LLVM) have a chance of applying more aggressive optimizations.

Note, for those trying to reproduce, this code imports:

import qualified Data.Vector as V
import Data.Bits
import Data.Time.Clock
import System.Random
import System.Random.Shuffle

type Permutation = V.Vector Int


Ok. There's lots of things you can try to improve this code.

Improvements

Data representation

  • Specialize to a concrete floating point type, instead of polymorphic floating point functions
  • Replace tuple (a,a,a) with unboxed triple T !Double !Double !Double
  • Switch from Data.Array to Data.Array.Unboxed for Permutations
  • Replace use of boxed array of triples with multidimensional unboxed array from repa package

Compiler flags

  • Compile with -O2 -fvia-C -optc-O3 -fexcess-precision -optc-march=native (or equivalent with -fllvm)
  • Increase spec constr threshold -- -fspec-constr-count=16

More efficient library functions

  • Use mersenne-random instead of StdGen to generate randoms
  • Replace mod with rem
  • Replace V.! indexing with unchecked indexing VU.unsafeIndex (after moving to Data.Vector.Unboxed

Runtime settings

  • Increase the default allocation area: -A20M or -H

Also, check your algorithm is identical to the C# one, and you're using the same data structures.

这篇关于在Haskell中优化数组的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆