麻烦实施“绳索”数据结构 [英] Trouble implementing a "rope" data structure in C++

查看:174
本文介绍了麻烦实施“绳索”数据结构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想创建一个绳子数据结构。它是一种二叉树类型,即一个递归的数据结构。



绳子的目的是分裂和连接应该是快速

例如,用户应该能够说 rope1 + rope2



但是,这提出了一个问题:

如果一个绳子被修改,其父节点被间接修改。

因为我的目标是让 string 替换 string 是不可接受的。



我的解决方案是:每当绳子 code>,我将创建一个节点,稍作更改,保留旧的未修改。



认为这将工作得很好。



在实践中,它涉及到一个堆分配用于(几乎?)每个修改的字符串。

即使是一个字符的改变也会导致一个新的堆对象,这不仅本身很慢,而且会显着降低内存本地性,对性能有负面影响。



我应该如何解决这个问题?

解决方案

写:为此,你需要重新计算每个分配的节点。



如果修改的节点有一个refcount为1(没有人指的是它)



这个实际问题是有用地从非变异操作中隔离:

  char& rope :: operator [](std :: string :: pos)

em> mutate被引用的char,但是当它实际上不会被强制选择const重载时,没有任何琐碎的方法。这意味着你不得不假设变异会发生,并可能触发一个不必要的副本,或者返回一些重载char转换和赋值的代理。



早期实现的 std :: string (其中一个字符串相当于一个单节点绳)iirc,并失去了青睐;部分是因为突变问题,部分是因为如果你必须担心多线程,COW和所需的引用计数变得越来越昂贵。






正如你所说,如果你的节点包含两个独立的状态类型:绳子和它自己的字符串和对它的子节点的引用(因为这会导致引用计数/子引用变量传播到树)。 / p>

如果相反,字符仅存储在叶节点,并且您执行非叶节点的完整副本它自己的目录结构)你仍然避免复制的字符,并且引用的共享状态是更简单。



这是否得到对数时间级联你想要的?也许不是:




  • 您必须复制所有非叶节点(并添加一个新根),这些的数量是log



> 更接近线性或对数时间取决于增加引用计数与复制目录树的相对成本。



没有这个,你得到快速连接,但任意字符访问可能(不可预测地)退化到对数时间,如果他们必须在树上传播一个COW操作。



我想,如果它适合您的用例,您可以实现移动连接:这将提供可能的恒定时间添加,您仍然避免额外的COW复杂性。


I'm trying to make a rope data structure. It's a type of binary tree, i.e. a recursive data structure.

The purpose of a rope is that splitting and concatenation should be fast, which means you avoid copying entire ropes.
So, for example, a user should be able to say rope1 + rope2 and expect a result in ~logarithmic time.

However, this presents a problem:
If a rope is modified, its parents are indirectly modified as well.
Because my goal is to make rope a drop-in replacement for string, this is not acceptable.

My solution to this is: Whenever there is any kind of change to a rope, I would create a new node, with a slight change, leaving the old ones unmodified.

In theory, I think this would work rather well.

In practice, though, it involves a heap allocation for (almost?) every modification of the strings.
Even a one-character change would result in a new heap object, which is not only slow by itself, but which also significantly decreases memory locality, very negatively impacting performance.

How should I go about solving this problem?

解决方案

The traditional approach would be copy-on-write: for this, you need to refcount each allocated node.

If the modified node has a refcount of 1 (no-one else refers to it), you don't need to duplicate it.

The practical problem with this is usefully segregating mutating from non-mutating operations:

    char& rope::operator[] (std::string::pos)

may mutate the referenced char, but there's no trivial way to force selection of the const overload when it actually won't. This means you either have to assume mutation will happen, and possibly trigger an unnecessary copy, or return instead some proxy which overloads char conversion and assignment.

This approach was tried for early implementations of std::string (where a string is equivalent to a single-node rope) iirc, and fell out of favour; partly because of the mutation problem, and partly because COW and the required refcounting become increasingly expensive if you have to worry about multi-threading.


As you say, the rope has an additional problem if your node contains two independent types of state: its own string and references to its child nodes (since this causes refcount/child reference mutations to propogate up the tree).

If instead the characters are stored only at the leaf nodes, and you do a full copy of the non-leaf nodes (so each rope has its own "directory structure") you still avoid copying the characters, and the refcounted shared state is much simpler.

Does this get the logarithmic-time concatenation you want? Perhaps not:

  • you have to copy all the non-leaf nodes (and add a new root), and the number of these is log the number of leaves
  • you also have to increment the leaf refcounts though, which is linear

Whether it looks closer to linear or logarithmic time depends on the relative cost of incrementing a refcount versus copying the directory tree.

Without this though, you get fast concatenations but arbitrary character accesses may (unpredictably) degenerate to logarithmic time if they have to propagate a COW operation up the tree.

I guess, if it suits your use case, you could implement move concatenations: this would give possibly constant-time addition and you still avoid the extra COW complexity.

这篇关于麻烦实施“绳索”数据结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆