如何采取一个数组片断与Repa在一定范围内 [英] How to take an array slice with Repa over a range

查看:216
本文介绍了如何采取一个数组片断与Repa在一定范围内的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图推行以计算积分图像使用Repa的累加值功能。我的当前实现如下所示:

I am attempting to implement a cumulative sum function using Repa in order to calculate integral images. My current implementation looks like the following:

cumsum :: (Elt a, Num a) => Array DIM2 a -> Array DIM2 a
cumsum array = traverse array id cumsum' 
    where
        elementSlice inner outer = slice array (inner :. (0 :: Int))
        cumsum' f (inner :. outer) = Repa.sumAll $ elementSlice inner outer

的问题是在elementSlice功能。在MATLAB或说numpy的,这可能被指定为数组[内,0:外部]。所以,我在找的是沿着线的东西:

The problem is in the elementSlice function. In matlab or say numpy this could be specified as array[inner,0:outer]. So what I am looking for is something along the lines of:

slice array (inner :. (Range 0 outer))

然而,似乎切片不允许在Repa当前指定以上范围。我考虑过使用分区,在高效的并行模板卷积在Haskell的讨论,但是这似乎是一个相当重量级的方法,如果他们将改变每个迭代。我也算是屏蔽片(由二元矢量相乘) - 但这又似乎像它会在大型矩阵非常糟糕,因为执行我会在矩阵中的每个点......

However, it seems that slices are not allowed to be specified over ranges currently in Repa. I considered using partitions as discussed in Efficient Parallel Stencil Convolution in Haskell, but this seems like a rather heavyweight approach if they would be changed every iteration. I also considered masking the slice (multiplying by a binary vector) - but this again seemed like it would perform very poorly on large matrices since I would be allocating a mask vector for every point in the matrix...

我的问题 - 没有人知道是否有计划增加对一系列切片到Repa支持?还是有高性能的方法可以让我已经与不同的方法攻击这个问题,也许?

My question - does anyone know if there are plans to add support for slicing over a range to Repa? Or is there a performant way I can already attack this problem, maybe with a different approach?

推荐答案

其实,我认为主要的问题是,Repa没有扫描原始。 (但是,非常相似的文库加速的一样。)有扫描的两个变体,preFIX扫描和后缀扫描。由于一维数组

Actually, I think the main problem is that Repa does not have a scan primitive. (However, a very similar library Accelerate does.) There are two variants of scan, prefix scan and suffix scan. Given a 1D array

[A_1,...,A_N]

preFIX扫描返回

prefix scan returns

[0,A_0,A_0 + A_1,...,A_0 + ... + A_ {N-1}]

而后缀扫描产生

[A_0,A_0 + A_1,...,A_0 + A_1 + ... + A_N]

我想这是你要的东西与你的累计总和( cumsum )的功能。

I assume this is what you are going for with your cumulative sum (cumsum) function.

preFIX和后缀扫描推广很自然地到多维数组并基于简树的高效实现。就这个话题比较旧纸是扫描基元向量计算机 。此外,Conal埃利奥特最近写几个博客 的帖子在哈斯克尔推导高效的并行扫描。

Prefix and suffix scans generalise quite naturally to multi-dimensional arrays and have an efficient implementation based on tree reduction. A relatively old paper on the topic is "Scan Primitives for Vector Computers". Also, Conal Elliott has recently written several blog posts on deriving efficient parallel scans in Haskell.

的积分图像(2D阵列上)可以通过执行两次扫描一水平和一个垂直地进行计算。在没有扫描原始的我已经实现的,效率很低。

The integral image (on a 2D array) can be calculated by doing two scans, one horizontally and one vertically. In the absence of a scan primitive I have implemented one, very inefficiently.

horizScan :: Array DIM2 Int -> Array DIM2 Int
horizScan arr = foldl addIt arr [0 .. n - 1]
  where 
    addIt :: Array DIM2 Int -> Int -> Array DIM2 Int
    addIt accum i = accum +^ vs
       where 
         vs = toAdd (i+1) n (slice arr (Z:.All:.i))
    (Z:.m:.n) = arrayExtent arr

--
-- Given an @i@ and a length @len@ and a 1D array @arr@ 
-- (of length n) produces a 2D array of dimensions n X len.
-- The columns are filled with zeroes up to row @i@.
-- Subsequently they are filled with columns equal to the 
-- values in @arr.
--
toAdd :: Int -> Int -> Array DIM1 Int -> Array DIM2 Int
toAdd i len arr = traverse arr (\sh -> sh:.(len::Int)) 
               (\_ (Z:.n:.m) -> if m >= i then arr ! (Z:.n) else 0) 

,用于计算积分图像的功能可以被定义为

The function for calculating the integral image can then be defined as

vertScan :: Array DIM2 Int -> Array DIM2 Int
vertScan = transpose . horizScan . transpose

integralImage = horizScan . vertScan

由于扫描已实施加速,它不应该太硬将其添加到Repa。我不知道,如果使用现有的Repa原语的高效实现是可能或没有。

Given that scan has been implemented for Accelerate, it shouldn't be too hard to add it to Repa. I am not sure if an efficient implementation using the existing Repa primitives is possible or not.

这篇关于如何采取一个数组片断与Repa在一定范围内的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆