在Haskell中具有高性能的可变,随机访问数组/向量 [英] Mutable, random-access array/vector with high performance in haskell

查看:106
本文介绍了在Haskell中具有高性能的可变,随机访问数组/向量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个关于Haskell的话题讨论了很多(例如 mutable-array-implementation),但对于仍需要频繁修改和随机访问数组/向量的情况,我仍然不确定最佳做法是什么.

This a topic on Haskell discussed a lot (e.g. mutable-array-implementation), but I am still not sure what is the best practice for the case requiring frequent modification and random-access of an array/vector.

说一个长度为1,000,000的向量.对它的操作涉及根据输入访问它的(小的,例如1000个)子集,并根据输入修改值.此外,这种操作被重复2,000,000次.任务本身可以用纯数据结构实现,例如list,如下所示,尽管效率很低:

Say a vector of length 1,000,000. Operation on it involves accessing a (small, e.g 1000) subset of it based on input, and modifying the values based on the input. Furthermore, such operation are repeated 2,000,000 times. The task itself can be implemented in pure data structure such as list as the following for example, though very inefficient:

type Vect = [Int]

f :: Vect -> [[Int]] -> Vect
f x indsList = foldl g x indsList

-- g is just an example of random-access and modifications on the values.
g :: Vect -> [Int] -> Vect
g x inds = map h $ zip x [0..]
    where h (x, i) = if i `elem` inds then x !! i + 1 else x !! i

哈希/地图数据结构(例如IntMap)可用于有效的大量随机访问,但数组/向量也应这样做.更重要的是,仍然需要通过可变结构来避免大量修改,以避免内存复制.实际上,Haskell中存在可变的随机存取数组/向量吗?如果使用ST/IO Monad,此类控件是否会影响我的设置的性能?

Hash/Map data structure (e.g. IntMap) could be used for efficient large amounts of random-accesses, but array/vector should do it too. More importantly, the large amount of modifications is still need to be addressed by a mutable structure to avoid memory replication. Is there a mutable, random-acesss array/vector in Haskell indeed? If ST/IO Monads used, does such controls affect performance in my settings?

推荐答案

Haskell确实具有有效的可变数组.

Haskell does have efficient mutable arrays.

  • STUArray ,具有相当复杂但通常不必要的 Ix -索引编制方法,具有很多边界检查和很少的特殊优化,这使其速度比慢一些理论上是可能的.

  • There is STUArray, which has the quite sophisticated but often just unnecessary Ix-indexing methodology with many bounds-checks and little special optimisation, which makes it a bit slower than theoretically possible.

所有 Data.Vector 的开销很小,大量使用了流融合优化,更喜欢简单的类似列表"的界面.这意味着您实际上可以非常轻松地将示例直接轻松地转换为不可变向量,并且仍然可以获得比您期望的更好的性能:

All Data.Vector have very little overhead, make heavy use of stream fusion optimisation, prefer a simple, "list-like" interface. That means you can actually transfer your example very easily directly to immutable vectors, and may still get better performance than you might expect:

import Data.Vector.Unboxed as VU

type Vect = VU.Vector Int

f :: Vect -> [[Int]] -> Vect
f x indsList = VU.foldl g x indsList


g :: Vect -> [Int] -> Vect
g x inds = VU.zipWith h x [0..]
    -- h is just an example of modifications on the values.
    where h x i
           | i`elem`inds   = x VU.! i + 1
           | otherwise     = x VU.! i

是的,您可能想在 ST monad中进行可变更新.不知道您所说的这样的控件会影响性能"是什么意思:当编译器优化了行之有效的类型安全性之后,命令语言中实际上就不会出现任何控件".哪个GHC可以做得很好;使用 Data.Vector.Unboxed ,您可以非常接近C的性能.总会有一些不可避免的开销,但这主要与垃圾回收等问题有关,您在Java中也会遇到这种问题.

Yes, you probably want to work in the ST monad for mutable updates. Not sure what you mean by "does such controls affect performance": there isn't really any "control" that's not also present in imperative languages, once the compiler has optimised away the proven type-safety. Which GHC can do quite well; you can get pretty close to C performance with Data.Vector.Unboxed. There is always some inevitable overhead, but that has mostly to do with garbage-collection etc. issues, which you would also get in Java.

由于 ST IO 是monad,所以编译器实际上可以进行一些更高级的优化,这在命令式语言中是不可能的,尽管编译器是还没那么远.

Since ST and IO are monads, the compiler can in fact do some more high-level optimisations that wouldn't be possible in an imperative language, though compiler's aren't really that far yet.

许多地方都讨论了性能,尤其是数组操作的性能,例如RWH中的

Performance, particularly of array operations, is discussed in many places, for instance in RWH.

这篇关于在Haskell中具有高性能的可变,随机访问数组/向量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆