如何将数据parallelim应用于haskell快速傅里叶变换? [英] How to apply data parallelim on haskell Fast Fourier Transformation?
问题描述
我有一个haskell代码来解决快速傅里叶变换,并且我想在其上应用数据并行性。然而,我使用的每一个策略都会产生太多的火花,其中大部分都溢出了。
有没有人对如何在数据库中应用好数据并行策略有所了解以下算法:
- radix 2 Cooley-Tukey FFT
fft :: [Complex Float] - > [Complex Float]
fft [a] = [a]
fft as = interleave ls rs
其中
(cs,ds)= bflyS as
ls = fft cs
rs = fft ds
interleave [] bs = bs
交错(a:as)bs = a:交错bs为
bflyS: :[复杂浮动] - > ([Complex Float],[Complex Float])
bflyS as =(los,rts)
其中
(ls,rs)=减半为
los = zipWith(+) ls rs
ros = zipWith( - )ls rs
rts = zipWith(*)ros [tw(length as)i | i < - [0 ..(length ros) - 1]]
减半as = splitAt n'as
where
n'= div(length as + 1) 2
- 旋转因子
tw :: Int - > Int - > Complex Float
tw nk = cis(-2 * pi * fromInteral k / fromInteral n)
PAR MONAD
左边的答案帮助我了解了如何在代码中应用数据并行性。但是,我研究了par monad并尝试将任务并行性应用于它。问题是它的运行速度比原来的bflyS慢。我认为我开发的代码是创建线程比较昂贵的方式,与我正在做的相关工作比较。有没有人知道如何更好地使用par monad?
- 新任务并行性bflyS
bflySTask :: [Complex Float] - > ([Complex Float],[Complex Float])
bflySTask as = runPar $ do
let(ls,rs)=减半为
v1< -new
v2< -new
let ros = DATA.zipWith( - )ls rs
let aux = DATA.map(tw n)[0..n-1]
fork $ put v1(DATA.zipWith +)ls rs)
fork $ put v2(DATA.zipWith(*)ros aux)
los< - get v1
rts< - get v2
return(los ,rts)
其中
n = DATA.length as
列举rock,但是它们的非连续内存模型意味着它们不能像遍历紧密数组一样快速地实现遍历,比如 您继续呼叫 Data.Vector
,因为你不可避免地结束了大量的缓存未命中。实际上,我很少看到基于列表的算法从并行化中获得很大收益,因为它们受内存的约束 - 而不是CPU性能。 你的旋转你可以在这里通过记忆获得很多。
长度
,但对于可能是 O (1))的东西,这是一个非常浪费的函数( O ( n ))。使用一些可能处理长度的容器;列表并不意味着(我们希望保持他们的能力是无限的)。
导入Data.Vector.Unboxed作为UBV
导入Control.Parallel。策略
类型ℂ=复合浮动
fft':: UBV.Vectorℂ - > UBV.Vectorℂ
fft'aₓs= interleavelᵥsrᵥs
其中(lᵥs,rᵥs)=(fftlₓs,fftrₓs)
`使用`如果UBV.lengthaₓs> 256然后parTuple2 else r0
(lₓs,rₓs)= byflySaₓs
I have a haskell code to resolve a Fast Fourier Transformation, and i want to apply data parallelism on it. However, every strategy that i use generate too many sparks and most of them are being overflowed.
Does anyone have any idea on how to apply a good data parallelism strategy on the following algorithm:
-- radix 2 Cooley-Tukey FFT
fft::[Complex Float] -> [Complex Float]
fft [a] = [a]
fft as = interleave ls rs
where
(cs,ds) = bflyS as
ls = fft cs
rs = fft ds
interleave [] bs = bs
interleave (a:as) bs = a : interleave bs as
bflyS :: [Complex Float] -> ([Complex Float], [Complex Float])
bflyS as = (los,rts)
where
(ls,rs) = halve as
los = zipWith (+) ls rs
ros = zipWith (-) ls rs
rts = zipWith (*) ros [tw (length as) i | i <- [0..(length ros) - 1]]
halve as = splitAt n' as
where
n' = div (length as + 1) 2
-- twiddle factors
tw :: Int -> Int -> Complex Float
tw n k = cis (-2 * pi * fromIntegral k / fromIntegral n)
PAR MONAD
The answer from leftaroundabout helped me a lot about understanging on how to apply data parallelism on the code. However, i have studied the par monad and tried to apply task parallelism to it. The problem is that it is running way slower than the original bflyS. I think the code i developed is way to expensive to create threads comparing to the relative work I am doing. Does anyone know how to use the par monad in a better way ?
--New Task Parallelism bflyS
bflySTask :: [Complex Float] -> ([Complex Float], [Complex Float])
bflySTask as = runPar $ do
let (ls, rs) = halve as
v1<-new
v2<-new
let ros = DATA.zipWith (-) ls rs
let aux = DATA.map (tw n) [0..n-1]
fork $ put v1 (DATA.zipWith (+) ls rs)
fork $ put v2 (DATA.zipWith (*) ros aux)
los <- get v1
rts <- get v2
return (los, rts)
where
n = DATA.length as
First off: there's a lot of optimisation to be done here before I'd start to think about parallelism:
Lists rock, but their non-consecutive memory model means they just can't allow for traversals nearly as fast as what you can achieve with tight arrays such as
Data.Vector
, because you inevitably end up with lots of cache misses. Indeed I've seldom seen a list-based algorithm to gain much from parallelisation, because they're bound by memory- rather than CPU performance.Your twiddle factors are computed over and over again, you can obviously gain a lot through memoisation here.
You keep on calling
length
, but that's an extremely wasteful function (O (n) for something that could be O (1)). Use some container that probably handles length; lists aren't meant to (we like to keep their ability to be infinite).
The parallelisation itself will be pretty simple; I'd check on the length as suggested by John L, indeed I'd rather require a pretty large size before sparking a thread, at least something like 256: as the performance probably becomes crucial only at sizes of several thousands, this should sill be enough threads to keep your cores busy.
import Data.Vector.Unboxed as UBV
import Control.Parallel.Strategies
type ℂ = Complex Float
fft' :: UBV.Vector ℂ -> UBV.Vector ℂ
fft' aₓs = interleave lᵥs rᵥs
where (lᵥs, rᵥs) = (fft lₓs, fft rₓs)
`using` if UBV.length aₓs > 256 then parTuple2 else r0
(lₓs, rₓs) = byflyS aₓs
这篇关于如何将数据parallelim应用于haskell快速傅里叶变换?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!