我需要采取明确的行动来促进持久性数据结构的共享吗? [英] Do I need to take explicit actions to facilitate sharing with persistent data structures?

查看:160
本文介绍了我需要采取明确的行动来促进持久性数据结构的共享吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我来自一个强制性的背景,正在尝试实现一个简单的不相交集(union-find)数据结构,以便在Haskell中创建和修改(持久化)数据结构来实践。目标是简单实现,但我也关心效率,我的问题与此有关。



首先,我创建了一个不相交的林实现通过等级联合,并通过定义点的数据类型开始:

  data Point = Point 
{ _value :: Int
,_parent :: Maybe Point
,_rank :: Int
}派生显示

一个不相交的集合森林是一个 IntMap Int→Point 映射: p>

  type DSForest = IntMap Point 

empty :: DSForest
empty = I.empty

单例集只是从其值 x 到具有值的点的映射 x ,没有父级和等级1:

  makeSet :: DSForest  - > Int  - > DSForest 
makeSet dsf x = I.insert x(Point x Nothing 0)dsf

现在,有趣的部分 - union 。此操作将通过将另一个点设置为其父点(并且在某些情况下更改其等级)来修改点。在 Point s'等级不同的情况下, Point 只是更新(一个新的点是创建)使其父指向另一个。在他们相等的情况下,创建一个新的,其等级增加一个:

  union :: DSForest  - > Int  - > Int  - > DSForest 
union dsf x y | x == y = dsf
union dsf xy =
如果_value x'== _value y'
then dsf
else case compare(_rank x')(_rank y')的
GT - > I.insert(_value y')y'{_parent = Just x'} dsf
LT - > I.insert(_value x')x'{_parent = Just y'} dsf
- 1)将x的等级增加1:
EQ - >让x''= x'{_rank = _rank x'+ 1}
- 2)更新x的等级的值以指向新的x:
dsf'= I.insert(_value x '')x''dsf
- 3)然后更新y以使新的x作为其父项:
在I.insert(_value y')y'{_parent = Just x''} dsf'
其中x'= dsf! findSet dsf x
y'= dsf! findSet dsf y

现在,对于我真正的问题,如果在 EQ case我改为完成以下内容:

  EQ  - >让dsf'= I.insert(_value x')x'{_rank = _rank x'+ 1} dsf 
in I.insert(_value y')y'{_parent = Just x'{_rank = _rank x '+ 1}} dsf'

首先插入新的 Point x ,其等级增加,然后具有 y'父母是一个新的 x ,其排名增加,这意味着他们不再指向相同的 c $ c在内存中?(这是否甚至重要?在使用/创建持久数据结构时,我应该担心这些事情吗?)



只是为了完整,这里是 findSet

  findSet :: DSForest  - > Int  - > Int 
findSet dsf'x'= case _parent(dsf'!x')
Just(Point v _ _) - > findSet dsf'v
没有 - > x'

(关于此代码的效率和设计的一般注释也是受欢迎的。)

解决方案

共享是一个编译器的东西。当它识别公共子表达式时,编译器可以选择由内存中相同的对象来表示它们。但是即使你使用这样一个编译器开关(如 -fno-cse ),没有任何义务,而这两个可能是(通常是不存在切换)由内存中的两个不同的,尽管相同的值表示。 Re:参考透明度



当我们命名东西并使用这个名字两次时,我们(合理地)期望它在内存中表示相同的对象。但是编译器可能会选择复制它,并在两个不同的使用站点中使用两个单独的副本,尽管不知道这样做。但它可能。 Re:参考透明度



另见:








以下是列表生成函数的几个例子,从上面的最后一个链接绘制。他们依赖于编译器不重复任何内容,即确实按照需要从调用需要共享任何命名对象 lambda演算操作语义(由nponeccop在评论中解释),而不会在其上引入任何额外的共享自己消除共同的子表达式:


  1. 共享fixpoint组合器,创建一个循环:



    修复f = x其中x = fx


  2. 非共享固定点组合器,创建伸缩式多级链(即常规递归链)



    _Y f = f(_Y f)

    / li>
  3. 两阶段组合 - 循环和Feed



    _2 f = f(fix f)



I come from an imperative background and am trying to implement a simple disjoint sets ("union-find") data structure to get some practice with creating and modifying (persistent) data structures in Haskell. The goal is to have a simple implementation, but I am also concerned about efficiency, and my question is related to this.

First, I created a disjoint-set forest implementation with union by rank and started by defining a data type for a "point":

data Point = Point
  { _value  :: Int
  , _parent :: Maybe Point
  , _rank   :: Int
  } deriving Show

A disjointed set forest is an IntMap with Int → Point mappings:

type DSForest = IntMap Point

empty :: DSForest
empty = I.empty

A singleton set is simply a mapping from its value x to a Point with value x, no parent and a rank of 1:

makeSet :: DSForest -> Int -> DSForest
makeSet dsf x = I.insert x (Point x Nothing 0) dsf

Now, the interesting part – union. This operation will modify a point by setting the other point as its parent (and in some cases change its rank). In the case where the Points' rank are different, the Point is simply "updated" (a new Point is created) to have its parent point to the other. In the case where they are equal, a new Point is created with its rank increased by one:

union :: DSForest -> Int -> Int -> DSForest
union dsf x y | x == y = dsf 
union dsf x y = 
   if _value x' == _value y'
      then dsf 
      else case compare (_rank x') (_rank y') of
                  GT -> I.insert (_value y') y'{ _parent = Just x' } dsf 
                  LT -> I.insert (_value x') x'{ _parent = Just y' } dsf 
                            -- 1) increase x's rank by one:
                  EQ -> let x''  = x'{ _rank = _rank x' + 1 } 
                            -- 2) update the value for x's rank to point to the new x:
                            dsf' = I.insert (_value x'') x'' dsf 
                            -- 3) then update y to have the new x as its parent:
                         in I.insert (_value y') y'{ _parent = Just x'' } dsf'
  where x' = dsf ! findSet dsf x
        y' = dsf ! findSet dsf y

Now, to my real question, if in the EQ case I had instead done the following:

EQ -> let dsf' = I.insert (_value x') x'{ _rank = _rank x' + 1} dsf 
       in I.insert (_value y') y'{ _parent = Just x'{ _rank = _rank x' + 1 }} dsf'

I.e. first insert a new Point x with its rank increased, and then having y''s parent be a new Point x with its rank increased, would this mean that they no longer point to the same Point in memory? (Does this even matter? Should I worry about these things when using/creating persistent data structures?)

And just for completeness, here is findSet:

findSet :: DSForest -> Int -> Int
findSet dsf' x' = case _parent (dsf' ! x') of
                     Just (Point v _ _)  -> findSet dsf' v
                     Nothing             -> x'

(General comments about the efficiency and design of this code are also welcome.)

解决方案

Sharing is a compiler thing. When it recognizes common sub-expressions, a compiler may chose to represent them both by the same object in memory. But even if you use such a compiler switch (like -fno-cse), it is under no obligation to do so, and the two might be (and usually are, in the absence of the switch) represented by two different, though of equal value, objects in memory. Re: referential transparency.

OTOH when we name something and use that name twice, we (reasonably) expect it to represent the same object in memory. But compiler might choose to duplicate it and use two separate copies in two different use sites, although it is not known to do so. But it might. Re: referential transparency.

See also:


Here's few examples with list-producing functions, drawing from the last link above. They rely on the compiler not duplicating anything, i.e. indeed sharing any named object as expected from call by need lambda calculus operational semantics (as explained by nponeccop in the comments), and not introducing any extra sharing on its own to eliminate common subexpressions:

  1. Sharing fixpoint combinator, creating a loop:

    fix f = x where x = f x

  2. Non-sharing fixpoint combinator, creating telescoping multistage chain (i.e. regular recursion chain)

    _Y f = f (_Y f)

  3. Two-stages combination - a loop and a feed

    _2 f = f (fix f)

这篇关于我需要采取明确的行动来促进持久性数据结构的共享吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆