Mathematica 中的树数据结构 [英] Tree data structure in Mathematica

查看:34
本文介绍了Mathematica 中的树数据结构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我主要将 mathematica 用作数学工作台和编写相对较小的临时程序.然而,我正在设计一个我打算在 Mathematica 中编程的系统.我需要将数据存储在树中,并搜索和遍历树.虽然我知道如何实现一棵树,但我更喜欢标准的、经过测试的代码.我在 Mathematica 用户 wiki 中查看了基本数据结构的包类型.我没有找到,尽管 Mathematica 文档中有一个小例子.

I have used mathematica mostly as a mathematics workbench and for writing relatively small ad-hoc programs. I am however designing a system which I intend to program in Mathematica. I need to store data in a tree, and search and traverse the tree. Although I know how to implement a tree, I prefer standard, tested code. I looked at what sort of packages there are for basic datastructures at the Mathematica users wiki. I have found none, although there is a small example in the Mathematica documentation.

现在我的问题:

  1. 是否有可用的数据结构(开源)包?

  1. Is there an ( open source ) package for data structures available somewhere?

你在数据结构方面使用了什么方法?逐渐开发自己的 util 包?

What approach have you used with regard to data structures? Gradually developing your own util package?

(不是问题,只是评论.也许……缺乏(很多可用的)开源包是 Mathematica 没有应有的势头的原因.恐怕是先有鸡还是先有蛋的问题.)

( Not a question, just a remark. Maybe... the lack of ( lots of available ) open source packages is the reason why Mathematica doesn't have the momentum it deserves. A chicken / egg issue, I am afraid. )

推荐答案

在 Mathematica 中,您所做的大部分工作都是基于表达式.表达式自然具有树结构.对于深度优先遍历(这可能是最常见的),您可以使用诸如 ScanMapCases 等函数.更传统的语言是没有简单的方法来保存表达式树中单个节点的身份,因为 Mathematica 中没有指针.此外,在 Mathematica 中对惯用表达式的许多操作会在您只需要在几个地方修改时复制整个表达式,因为表达式是不可变的.

In Mathematica, most of what you do is based on expressions. Expressions naturally have the tree structure. For depth-first traversals (which are probably most common), you can then use functions like Scan,Map, Cases etc. The difference w.r.t the more traditional languages is that there is no simple way to preserve the identity of individual node in an expression tree, since there are no pointers in Mathematica. Also, many operations on expressions that are idiomatic in Mathematica would copy the entire expression when you only need to modify it in a few places, because expressions are immutable.

使用不可变的 Mathematica 表达式作为树仍然有几个优点.一个是,因为它们是不可变的,只需查看它们就很容易理解它们存储的内容(状态和行为不是混合的).另一个是有高效和通用的函数,例如 MapMapIndexedScan,可以遍历它们.例如,访问者设计模式是 invisible - 它只是 Map[f,tree,Infinity],内置于语言中.此外,还有诸如CasesReplaceReplaceAll 等内置函数,它们允许编写非常简洁和声明性的代码来解构树,找到具有某种语法或满足某种条件的树块等.由于树不限于仅从列表构建和从不同的头构建,因此可以有效地使用它来编写非常简洁的树处理代码.最后,本着 探索性和自下而上的编程,缩短了开发周期并最终导致更好的设计.

Using immutable Mathematica expressions as trees still has several advantages. One is that, because they are immutable, it is easy to understand what they store by just looking at them (state and behavior are not mixed). Another is that there are efficient and generic functions such as Map, MapIndexed or Scan, that traverse them. For example, the visitor design pattern is invisible - it is just Map[f,tree,Infinity], built into the langauge. Also, there are built-in functions such as Cases, Replace, ReplaceAll, etc, which allow one to write very concise and declarative code to destructure trees, find pieces of trees with certain syntax or satisfying some condition, etc. Since trees are not restricted to only be built from lists and be built from different heads, one can effectively use this to write very concise tree-processing code. Finally, a freedom to very easily build any tree structure you want makes it much easier to perform experiments and prototype things, in the spirit of exploratory and bottom-up programming, which shortens the development cycle and ultimately leads to better designs.

也就是说,您当然可以实现有状态"(可变)树数据结构.我怀疑它尚未完成的真正原因通常是与构建、修改和遍历这样的树相关的性能损失,因为它将在每一步都经历一个完整的符号评估过程(参见 这篇 帖子了解更多详情).例如,关于如何在 Mathematica 上下文中使用二叉搜索树以获得非常高效的代码的 2 个示例,请参阅我的帖子 此处(通用符号设置)和此处(在已编译代码的上下文中).对于在 Mathematica 中惯用地构建数据结构的一般方法,我推荐 Roman Maeder 的书籍:Programming in Mathematica"、Mathematica 程序员 I&II",尤其是Mathematica 中的计算机科学".在后者中,他详细讨论了如何在 Mathematica 中实现二叉搜索树.编辑 正如@Simon 提到的,@Daniel Lichtblau 的演讲也是一个很好的资源,它展示了如何构建数据结构并使其高效.

That said, you can certainly implement "stateful" (mutable) tree data structure. The real reason it has not been done yet generally is, I suspect, the performance hit associated with building, modifying and traversing such a tree, since it will undergo a full symbolic evaluation process at every step (see this post for more details on that). For 2 examples of how, for example, a binary search tree can be used in Mathematica context for pretty efficient code, see my posts here (generic symbolic setting) and here (in the context of Compiled code). For general ways to construct data structures idiomatically in Mathematica, I recommend books of Roman Maeder: "Programming in Mathematica", "Mathematica programmer I&II", and especially "Computer Science in Mathematica". In the latter he has a detailed discussion of how to implement binary search tree in Mathematica. EDIT As @Simon mentioned, the talk of @Daniel Lichtblau is also a great resource, which shows how to build data structures and make them efficient.

关于在 Mathematica 中实现包含某些状态的数据结构的一般方法,这里是从我在 this Mathgroup 线程 - 它实现了配对"数据结构.

Regarding general ways to implement data structures in Mathematica which would incorporate some state, here is a simple example extracted from my post in this Mathgroup thread - it implements a "pair" data structure.

Unprotect[pair, setFirst, getFirst, setSecond, getSecond, new, delete];
ClearAll[pair, setFirst, getFirst, setSecond, getSecond, new, delete];
Module[{first, second},
  first[_] := {};
  second[_] := {};
  pair /: new[pair[]] := pair[Unique[]];
  pair /: pair[tag_].delete[] := (first[tag] =.; second[tag] =.);
  pair /: pair[tag_].setFirst[value_] := first[tag] = value;
  pair /: pair[tag_].getFirst[] := first[tag];
  pair /: pair[tag_].setSecond[value_] := second[tag] = value;
  pair /: pair[tag_].getSecond[] := second[tag];
  Format[pair[x_Symbol]] := "pair[" <> ToString[Hash[x]] <> "]";
];
Protect[pair, setFirst, getFirst, setSecond, getSecond, new, delete]; 

以下是您可以使用它的方法:

Here is how you could use it:

pr = new[pair[]];
pr.setFirst[10];
pr.setSecond[20];
{pr.getFirst[], pr.getSecond[]}

{10, 20}

创建一个新的配对对象列表:

Creating a list of new pair objects :

pairs = Table[new[pair[]], {10}]

{"pair[430427975]", "pair[430428059]", "pair[430428060]", "pair[430428057]",
"pair[430428058]", "pair[430428063]", "pair[430428064]", "pair[430428061]", 
"pair[430428062]", "pair[430428051]"}

设置字段:

Module[{i},
 For[i = 1, i <= 10, i++,
  pairs[[i]].setFirst[10*i];
  pairs[[i]].setSecond[20*i];]]

检查字段:

#.getFirst[] & /@ pairs

{10, 20, 30, 40, 50, 60, 70, 80, 90, 100}

#.getSecond[] & /@ pairs

{20, 40, 60, 80, 100, 120, 140, 160, 180, 200} 

在我提到的帖子中有更详细的讨论.以这种方式创建的对象"的一大问题是它们没有自动垃圾回收,这可能是顶级 Mathematica 中实现的 OOP 扩展本身并没有真正起飞的主要原因之一.

In the post I mentioned there is a more detailed discussion. One big problem for "objects" created in this way is that there is no automatic garbage collection for them, which may be one of the major reasons why OOP extensions implemented in top-level Mathematica itself did not really take off.

Mathematica 有几个 OOP 扩展,例如 Roman Maeder 的 classes.m 包(来源在他的Mathematica Programmer"一书中),Objectica商业包,以及其他几个.但是,除非 Mathematica 本身提供有效的机制(可能基于某种指针或引用机制)来构建可变数据结构(如果发生这种情况),否则可能会出现与此类数据结构的顶级实现相关的巨大性能损失在 mm.另外,由于 mma 是基于不变性作为核心思想之一,因此要使可变数据结构与 Mathematica 编程的其他习语很好地契合并不容易.

There are several OOP extensions for Mathematica, such as the classes.m package by Roman Maeder (the source is in his "Mathematica Programmer" book), the Objectica commercial package, and several others. But until Mathematica itself would provide efficient mechanisms (perhaps based on some kind of pointer or reference mechanism) for building mutable data structures (if this ever happens), there will probably be a large performance hit associated with top-level implementations of such data structures in mma. Also, since mma is based on immutability as one of the core ideas, it is not very easy to make mutable data structures fit well with other idioms of Mathematica programming.

编辑

这是一个与上述示例一致的基本状态树实现:

Here is a bare-bones stateful tree implementation along the lines of the example above:

Module[{parent, children, value},
  children[_] := {};
  value[_] := Null;
  node /: new[node[]] := node[Unique[]];
  node /: node[tag_].getChildren[] := children[tag];
  node /: node[tag_].addChild[child_node, index_] := 
        children[tag] = Insert[children[tag], child, index];
  node /: node[tag_].removeChild[index_] := 
        children[tag] = Delete[children[tag], index];
  node /: node[tag_].getChild[index_] := children[tag][[index]];
  node /: node[tag_].getValue[] := value[tag];
  node /: node[tag_].setValue[val_] := value[tag] = val;
];

一些使用示例:

In[68]:= root = new[node[]]

Out[68]= node[$7]

In[69]:= root.addChild[new[node[]], 1]

Out[69]= {node[$8]}

In[70]:= root.addChild[new[node[]], 2]

Out[70]= {node[$8], node[$9]}

In[71]:= root.getChild[1].addChild[new[node[]], 1]

Out[71]= {node[$10]}

In[72]:= root.getChild[1].getChild[1].setValue[10]

Out[72]= 10

In[73]:= root.getChild[1].getChild[1].getValue[]

Out[73]= 10

有关使用这种可变树数据结构的一个重要示例,请参阅 这个 我的帖子.它还将这种方法与更大量重用 Mathematica 原生数据结构和函数的方法进行了对比,并很好地说明了本文开头讨论的要点.

For one non-trivial example of use of this mutable tree data structure, see this post of mine. It also confronts this method with the one more heavily reusing Mathematica native data structures and functions, and illustrates well the points discussed at the beginning of this post.

这篇关于Mathematica 中的树数据结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆