在函数式编程中如何实现集合的内存效率非破坏性操作? [英] How is memory-efficient non-destructive manipulation of collections achieved in functional programming?

查看:187
本文介绍了在函数式编程中如何实现集合的内存效率非破坏性操作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图弄清楚在功能编程中如何实现大型集合的非破坏性操作,即。如何可以更改或删除单个元素,而无需创建一个全新的集合,其中所有元素,甚至未修改的元素都将在内存中复制。 (即使原始收藏将被垃圾回收,我也希望这样一个集合的内存足迹和一般性能可怕。)

I'm trying to figure out how non-destructive manipulation of large collections is implemented in functional programming, ie. how it is possible to alter or remove single elements without having to create a completely new collection where all elements, even the unmodified ones, will be duplicated in memory. (Even if the original collection would be garbage-collected, I'd expect the memory footprint and general performance of such a collection to be awful.)

使用F#,我想出了一个函数 insert ,将一个列表分成两件,并介绍了一个新的元素,似乎没有克隆所有不变的元素:

Using F#, I came up with a function insert that splits a list into two pieces and introduces a new element in-between, seemingly without cloning all unchanged elements:

// return a list without its first n elements:
// (helper function)
let rec skip list n =
    if n = 0 then
        list
    else
        match list with
        | []    -> []
        | x::xs -> skip xs (n-1)

// return only the first n elements of a list:
// (helper function)
let rec take list n =
    if n = 0 then
        []
    else
        match list with
        | []    -> []
        | x::xs -> x::(take xs (n-1))

// insert a value into a list at the specified zero-based position:
let insert list position value =
    (take list position) @ [value] @ (skip list position)

然后我检查是否对象原始列表通过使用.NET的 Object.ReferenceEquals

I then checked whether objects from an original list are "recycled" in new lists by using .NET's Object.ReferenceEquals:

open System

let (===) x y =
    Object.ReferenceEquals(x, y)

let x = Some(42)
let L = [Some(0); x; Some(43)]
let M = Some(1) |> insert L 1

以下三个表达式都评估为 true ,表示 x 所引用的值在列表 L M ,即在内存中只有1份此值:

The following three expressions all evaluate to true, indicating that the value referred to by x is re-used both in lists L and M, ie. that there is only 1 copy of this value in memory:

L.[1] === x
M.[2] === x
L.[1] === M.[2]



我的问题:



语言通常会重新使用值,而不是将其克隆到新的内存位置,或者我只是幸运的F#的行为?假设是前者,这样可以在函数式编程中实现集合的高效编辑效果如何?

My question:

Do functional programming languages generally re-use values instead of cloning them to a new memory location, or was I just lucky with F#'s behaviour? Assuming the former, is this how reasonably memory-efficient editing of collections can be implemented in functional programming?

(Btw:我知道关于克里斯Okasaki的书纯功能数据结构 ,但还没有时间阅读彻底的。)

(Btw.: I know about Chris Okasaki's book Purely functional data structures, but haven't yet had the time to read it thoroughly.)

推荐答案


我试图弄清楚如何
无损在
函数式编程中实现大
集合的操纵,即。如何
可以更改或删除单个
元素,而无需创建
全新集合,其中所有
元素,即使是未修改的,
将被重复在记忆中。

I'm trying to figure out how non-destructive manipulation of large collections is implemented in functional programming, ie. how it is possible to alter or remove single elements without having to create a completely new collection where all elements, even the unmodified ones, will be duplicated in memory.

这个页面在F#中有一些数据结构的描述和实现。大部分来自冈崎的纯功能数据结构,尽管AVL树是我自己的实现,因为它不在书中。

This page has a few descriptions and implementations of data structures in F#. Most of them come from Okasaki's Purely Functional Data Structures, although the AVL tree is my own implementation since it wasn't present in the book.

现在,由于你问过,关于重用未修改的节点,我们来看一个简单的二叉树:

Now, since you asked, about reusing unmodified nodes, let's take a simple binary tree:

type 'a tree =
    | Node of 'a tree * 'a * 'a tree
    | Nil

let rec insert v = function
    | Node(l, x, r) as node ->
        if v < x then Node(insert v l, x, r)    // reuses x and r
        elif v > x then Node(l, x, insert v r)  // reuses x and l
        else node
    | Nil -> Node(Nil, v, Nil)

请注意,我们重新使用我们的一些节点。假设我们从这棵树开始:

Note that we re-use some of our nodes. Let's say we start with this tree:

当我们在树中插入一个 e 我们得到一个全新的树,其中一些节点指向我们原始的树:

When we insert an e into the the tree, we get a brand new tree, with some of the nodes pointing back at our original tree:

如果我们没有引用 xs 树,那么.NET将垃圾收集任何没有实时参考的节点,特别是 d g f 节点。

If we don't have a reference to the xs tree above, then .NET will garbage collect any nodes without live references, specifically thed, g and f nodes.

请注意,我们只修改了节点>沿着我们插入的节点的路径。这在大多数不可变数据结构中是非常典型的,包括列表。因此,我们创建的节点数量正好等于我们需要遍历的节点数量才能插入到我们的数据结构中。

Notice that we've only modified nodes along the path of our inserted node. This is pretty typical in most immutable data structures, including lists. So, the number of nodes we create is exactly equal to the number of nodes we need to traverse in order to insert into our data structure.


功能编程语言
通常重新使用值而不是
将它们克隆到新的内存位置,
或者我只是幸运地使用F#的
行为?假设前者是
这样可以有效地节省内存的
编辑集合可以在函数式编程中实现

Do functional programming languages generally re-use values instead of cloning them to a new memory location, or was I just lucky with F#'s behaviour? Assuming the former, is this how reasonably memory-efficient editing of collections can be implemented in functional programming?

是的。

但是,列表不是一个非常好的数据结构,因为大多数非平凡的操作需要O(n)时间

Lists, however, aren't a very good data structure, since most non-trivial operations on them require O(n) time.

平衡二叉树支持O(log n)插入,这意味着我们在每个插入上创建O(log n)个副本。由于log 2(10 ^ 15)=〜50,对于这些特定数据结构,开销非常小。即使您在插入/删除后保持每个对象的每个副本,您的内存使用率将以O(n log n)的速度增加 - 在我看来非常合理。

Balanced binary trees support O(log n) inserts, meaning we create O(log n) copies on every insert. Since log2(10^15) is ~= 50, the overhead is very very tiny for these particular data structures. Even if you keep around every copy of every object after inserts/deletes, your memory usage will increase at a rate of O(n log n) -- very reasonable, in my opinion.

这篇关于在函数式编程中如何实现集合的内存效率非破坏性操作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆