如何将数据parallelim应用于haskell快速傅里叶变换？ [英] How to apply data parallelim on haskell Fast Fourier Transformation?

查看：115 发布时间：2018/6/4 16:04:59 haskell parallel-processing fft

本文介绍了如何将数据parallelim应用于haskell快速傅里叶变换？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个haskell代码来解决快速傅里叶变换，并且我想在其上应用数据并行性。然而，我使用的每一个策略都会产生太多的火花，其中大部分都溢出了。

有没有人对如何在数据库中应用好数据并行策略有所了解以下算法：

   -  radix 2 Cooley-Tukey FFT 
 
 fft :: [Complex Float] - > [Complex Float] 
 fft [a] = [a] 
 fft as = interleave ls rs 
其中
（cs，ds）= bflyS as 
 ls = fft cs 
 rs = fft ds 
 
 interleave [] bs = bs 
交错（a：as）bs = a：交错bs为
 
 bflyS： ：[复杂浮动]  - > （[Complex Float]，[Complex Float]）
 bflyS as =（los，rts）
其中
（ls，rs）=减半为
 los = zipWith（+） ls rs 
 ros = zipWith（ - ）ls rs 
 rts = zipWith（*）ros [tw（length as）i | i < -  [0 ..（length ros） -  1]] 
 
减半as = splitAt n'as 
 where 
 n'= div（length as + 1） 2 
 
  - 旋转因子
 tw :: Int  - > Int  - > Complex Float 
 tw nk = cis（-2 * pi * fromInteral k / fromInteral n）

PAR MONAD

左边的答案帮助我了解了如何在代码中应用数据并行性。但是，我研究了par monad并尝试将任务并行性应用于它。问题是它的运行速度比原来的bflyS慢。我认为我开发的代码是创建线程比较昂贵的方式，与我正在做的相关工作比较。有没有人知道如何更好地使用par monad？

   - 新任务并行性bflyS 
 
 bflySTask :: [Complex Float]  - > （[Complex Float]，[Complex Float]）
 bflySTask as = runPar $ do 
 let（ls，rs）=减半为
 v1< -new 
 v2< -new 
 let ros = DATA.zipWith（ - ）ls rs 
 let aux = DATA.map（tw n）[0..n-1] 
 fork $ put v1（DATA.zipWith +）ls rs）
 fork $ put v2（DATA.zipWith（*）ros aux）
 los<  -  get v1 
 rts<  -  get v2 
 return（los ，rts）
其中
n = DATA.length as

解决方案<首先，在开始考虑并行性之前，需要进行很多优化：
列举rock，但是它们的非连续内存模型意味着它们不能像遍历紧密数组一样快速地实现遍历，比如 Data.Vector ，因为你不可避免地结束了大量的缓存未命中。实际上，我很少看到基于列表的算法从并行化中获得很大收益，因为它们受内存的约束 - 而不是CPU性能。 你的旋转你可以在这里通过记忆获得很多。

您继续呼叫长度，但对于可能是 O （1））的东西，这是一个非常浪费的函数（ O （ n ））。使用一些可能处理长度的容器;列表并不意味着（我们希望保持他们的能力是无限的）。
平行化本身将是很简单;我会检查John L提出的长度，我确实需要一个相当大的尺寸才能引发一个线程，至少类似于256：因为性能可能变得至关重要，只有在数千个尺寸的情况下，这应该是基础
导入Data.Vector.Unboxed作为UBV 导入Control.Parallel。策略类型ℂ=复合浮动 fft':: UBV.Vectorℂ - > UBV.Vectorℂ fft'aₓs= interleavelᵥsrᵥs 其中（lᵥs，rᵥs）=（fftlₓs，fftrₓs） `使用`如果UBV.lengthaₓs> 256然后parTuple2 else r0 （lₓs，rₓs）= byflySaₓs

I have a haskell code to resolve a Fast Fourier Transformation, and i want to apply data parallelism on it. However, every strategy that i use generate too many sparks and most of them are being overflowed.

Does anyone have any idea on how to apply a good data parallelism strategy on the following algorithm:
-- radix 2 Cooley-Tukey FFT fft::[Complex Float] -> [Complex Float] fft [a] = [a] fft as = interleave ls rs where (cs,ds) = bflyS as ls = fft cs rs = fft ds interleave [] bs = bs interleave (a:as) bs = a : interleave bs as bflyS :: [Complex Float] -> ([Complex Float], [Complex Float]) bflyS as = (los,rts) where (ls,rs) = halve as los = zipWith (+) ls rs ros = zipWith (-) ls rs rts = zipWith (*) ros [tw (length as) i | i <- [0..(length ros) - 1]] halve as = splitAt n' as where n' = div (length as + 1) 2 -- twiddle factors tw :: Int -> Int -> Complex Float tw n k = cis (-2 * pi * fromIntegral k / fromIntegral n)
PAR MONAD

The answer from leftaroundabout helped me a lot about understanging on how to apply data parallelism on the code. However, i have studied the par monad and tried to apply task parallelism to it. The problem is that it is running way slower than the original bflyS. I think the code i developed is way to expensive to create threads comparing to the relative work I am doing. Does anyone know how to use the par monad in a better way ?
--New Task Parallelism bflyS bflySTask :: [Complex Float] -> ([Complex Float], [Complex Float]) bflySTask as = runPar $ do let (ls, rs) = halve as v1<-new v2<-new let ros = DATA.zipWith (-) ls rs let aux = DATA.map (tw n) [0..n-1] fork $ put v1 (DATA.zipWith (+) ls rs) fork $ put v2 (DATA.zipWith (*) ros aux) los <- get v1 rts <- get v2 return (los, rts) where n = DATA.length as

解决方案
First off: there's a lot of optimisation to be done here before I'd start to think about parallelism:

Lists rock, but their non-consecutive memory model means they just can't allow for traversals nearly as fast as what you can achieve with tight arrays such as Data.Vector, because you inevitably end up with lots of cache misses. Indeed I've seldom seen a list-based algorithm to gain much from parallelisation, because they're bound by memory- rather than CPU performance.

Your twiddle factors are computed over and over again, you can obviously gain a lot through memoisation here.

You keep on calling length, but that's an extremely wasteful function (O (n) for something that could be O (1)). Use some container that probably handles length; lists aren't meant to (we like to keep their ability to be infinite).

The parallelisation itself will be pretty simple; I'd check on the length as suggested by John L, indeed I'd rather require a pretty large size before sparking a thread, at least something like 256: as the performance probably becomes crucial only at sizes of several thousands, this should sill be enough threads to keep your cores busy.
import Data.Vector.Unboxed as UBV import Control.Parallel.Strategies type ℂ = Complex Float fft' :: UBV.Vector ℂ -> UBV.Vector ℂ fft' aₓs = interleave lᵥs rᵥs where (lᵥs, rᵥs) = (fft lₓs, fft rₓs) `using` if UBV.length aₓs > 256 then parTuple2 else r0 (lₓs, rₓs) = byflyS aₓs

这篇关于如何将数据parallelim应用于haskell快速傅里叶变换？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何将数据parallelim应用于haskell快速傅里叶变换？ [英] How to apply data parallelim on haskell Fast Fourier Transformation?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何将数据parallelim应用于haskell快速傅里叶变换？ [英] How to apply data parallelim on haskell Fast Fourier Transformation?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭