如何将数据parallelim应用于haskell快速傅里叶变换? [英] How to apply data parallelim on haskell Fast Fourier Transformation?

查看:115
本文介绍了如何将数据parallelim应用于haskell快速傅里叶变换?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个haskell代码来解决快速傅里叶变换,并且我想在其上应用数据并行性。然而,我使用的每一个策略都会产生太多的火花,其中大部分都溢出了。

有没有人对如何在数据库中应用好数据并行策略有所了解以下算法:

   -  radix 2 Cooley-Tukey FFT 

fft :: [Complex Float] - > [Complex Float]
fft [a] = [a]
fft as = interleave ls rs
其中
(cs,ds)= bflyS as
ls = fft cs
rs = fft ds

interleave [] bs = bs
交错(a:as)bs = a:交错bs为

bflyS: :[复杂浮动] - > ([Complex Float],[Complex Float])
bflyS as =(los,rts)
其中
(ls,rs)=减半为
los = zipWith(+) ls rs
ros = zipWith( - )ls rs
rts = zipWith(*)ros [tw(length as)i | i < - [0 ..(length ros) - 1]]

减半as = splitAt n'as
where
n'= div(length as + 1) 2

- 旋转因子
tw :: Int - > Int - > Complex Float
tw nk = cis(-2 * pi * fromInteral k / fromInteral n)

PAR MONAD



左边的答案帮助我了解了如何在代码中应用数据并行性。但是,我研究了par monad并尝试将任务并行性应用于它。问题是它的运行速度比原来的bflyS慢。我认为我开发的代码是创建线程比较昂贵的方式,与我正在做的相关工作比较。有没有人知道如何更好地使用par monad?

   - 新任务并行性bflyS 

bflySTask :: [Complex Float] - > ([Complex Float],[Complex Float])
bflySTask as = runPar $ do
let(ls,rs)=减半为
v1< -new
v2< -new
let ros = DATA.zipWith( - )ls rs
let aux = DATA.map(tw n)[0..n-1]
fork $ put v1(DATA.zipWith +)ls rs)
fork $ put v2(DATA.zipWith(*)ros aux)
los< - get v1
rts< - get v2
return(los ,rts)
其中
n = DATA.length as


解决方案<首先,在开始考虑并行性之前,需要进行很多优化:

列举rock,但是它们的非连续内存模型意味着它们不能像遍历紧密数组一样快速地实现遍历,比如 Data.Vector ,因为你不可避免地结束了大量的缓存未命中。实际上,我很少看到基于列表的算法从并行化中获得很大收益,因为它们受内存的约束 - 而不是CPU性能。 你的旋转你可以在这里通过记忆获得很多。

  • 您继续呼叫长度,但对于可能是 O (1))的东西,这是一个非常浪费的函数( O n ))。使用一些可能处理长度的容器;列表并不意味着(我们希望保持他们的能力是无限的)。

  • 平行化本身将是很简单;我会检查John L提出的长度,我确实需要一个相当大的尺寸才能引发一个线程,至少类似于256:因为性能可能变得至关重要,只有在数千个尺寸的情况下,这应该是基础

     导入Data.Vector.Unboxed作为UBV 
    导入Control.Parallel。策略

    类型ℂ=复合浮动

    fft':: UBV.Vectorℂ - > UBV.Vectorℂ
    fft'aₓs= interleavelᵥsrᵥs
    其中(lᵥs,rᵥs)=(fftlₓs,fftrₓs)
    `使用`如果UBV.lengthaₓs> 256然后parTuple2 else r0
    (lₓs,rₓs)= byflySaₓs


    I have a haskell code to resolve a Fast Fourier Transformation, and i want to apply data parallelism on it. However, every strategy that i use generate too many sparks and most of them are being overflowed.

    Does anyone have any idea on how to apply a good data parallelism strategy on the following algorithm:

    -- radix 2 Cooley-Tukey FFT
    
    fft::[Complex Float] -> [Complex Float]
    fft [a] = [a]
    fft as = interleave ls rs
      where
        (cs,ds) = bflyS as
        ls = fft cs
        rs = fft ds
    
    interleave [] bs = bs
    interleave (a:as) bs = a : interleave bs as
    
    bflyS :: [Complex Float] -> ([Complex Float], [Complex Float])
    bflyS as = (los,rts)
      where
        (ls,rs) = halve as
        los = zipWith (+) ls rs
        ros = zipWith (-) ls rs
        rts = zipWith (*) ros [tw (length as) i | i <- [0..(length ros) - 1]]
    
    halve as = splitAt n' as
      where
        n' = div (length as + 1) 2
    
    -- twiddle factors
    tw :: Int -> Int -> Complex Float
    tw n k = cis (-2 * pi * fromIntegral k / fromIntegral n)
    

    PAR MONAD

    The answer from leftaroundabout helped me a lot about understanging on how to apply data parallelism on the code. However, i have studied the par monad and tried to apply task parallelism to it. The problem is that it is running way slower than the original bflyS. I think the code i developed is way to expensive to create threads comparing to the relative work I am doing. Does anyone know how to use the par monad in a better way ?

    --New Task Parallelism bflyS
    
    bflySTask :: [Complex Float] -> ([Complex Float], [Complex Float])
    bflySTask as = runPar $ do
        let (ls, rs) = halve as
        v1<-new
        v2<-new
        let ros = DATA.zipWith (-) ls rs
        let aux = DATA.map  (tw n) [0..n-1]
        fork $ put v1 (DATA.zipWith (+) ls rs)
        fork $ put v2 (DATA.zipWith (*) ros aux)
        los <- get v1
        rts <- get v2   
        return (los, rts)
            where
                    n = DATA.length as
    

    解决方案

    First off: there's a lot of optimisation to be done here before I'd start to think about parallelism:

    • Lists rock, but their non-consecutive memory model means they just can't allow for traversals nearly as fast as what you can achieve with tight arrays such as Data.Vector, because you inevitably end up with lots of cache misses. Indeed I've seldom seen a list-based algorithm to gain much from parallelisation, because they're bound by memory- rather than CPU performance.

    • Your twiddle factors are computed over and over again, you can obviously gain a lot through memoisation here.

    • You keep on calling length, but that's an extremely wasteful function (O (n) for something that could be O (1)). Use some container that probably handles length; lists aren't meant to (we like to keep their ability to be infinite).

    The parallelisation itself will be pretty simple; I'd check on the length as suggested by John L, indeed I'd rather require a pretty large size before sparking a thread, at least something like 256: as the performance probably becomes crucial only at sizes of several thousands, this should sill be enough threads to keep your cores busy.

    import Data.Vector.Unboxed as UBV
    import Control.Parallel.Strategies
    
    type ℂ = Complex Float
    
    fft' :: UBV.Vector ℂ -> UBV.Vector ℂ
    fft' aₓs = interleave lᵥs rᵥs
     where (lᵥs, rᵥs) = (fft lₓs, fft rₓs)
                         `using` if UBV.length aₓs > 256 then parTuple2 else r0
           (lₓs, rₓs) = byflyS aₓs
    

    这篇关于如何将数据parallelim应用于haskell快速傅里叶变换?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆