最有效的数据结构为文本添加样式 [英] Most efficient data structure to add styles to text

查看:247
本文介绍了最有效的数据结构为文本添加样式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找最好的数据结构来添加文本的样式(比如在文本编辑器中)。结构应该允许以下操作:


  1. 快速查找绝对位置X的所有样式

  2. 在任何位置快速插入文本(必须移动该位置后的样式)。

  3. 文本的每个位置都必须支持任意数量的样式(重叠)。

我考虑了包含文本范围的列表/数组,但不允许快速插入,而不重新计算插入点后所有样式的位置。 p>

具有相对偏移量的树结构支持#2,但当我向文本添加大量样式时,树将退化。



任何其他选项?

解决方案

我从来没有开发过一个编辑器,但是如何做:



我相信有可能扩展用于存储文本字符的方案,具体取决于您的实现细节(languag e,工具包等)以及您的性能和资源使用要求。



而不是为样式使用单独的数据结构,我宁愿拥有每个字符并指向具有适用字符的数组或列表。具有相同集合样式的字符可以指向相同的数组或列表,以便可以共享。



字符插入和删除不会影响样式本身,分开改变引用次数可以通过一些引用计数来处理。



根据您的编程语言,您甚至可以通过指向中途进入列表来压缩一些东西,尽管这样做的额外的簿记实际上可能使其效率更低。 / p>

这个建议的主要问题是内存使用。在用C编写的ASCII编辑器中,将指针与每个字符串绑定将由于结构对齐填充而在64位系统上将其有效内存使用从1字节提高到12字节。



我会考虑将文本分解成小的可变大小块,这将允许您有效地压缩指针。例如。一个32个字符的块可能在C中看起来像这样:

  struct _BLK_ {
unsigned char size;
无符号int样式;
char content [];
}

有趣的部分是对结构变量部分的元数据处理,包含存储的文本和任何样式指针。 size元素将指示字符数。整数(因此是32个字符的限制)的样式将被看作是一组32个1位字段,每个字段都显示一个字符是否有自己的样式指针,或者是否应该使用与上一个字符相同的样式。这样一个具有单一样式的32-char块只会占用大小为char,样式掩码和单个指针的额外开销以及任何填充字节。将字符插入和删除这样的小数组应该是相当快的。



对于文本存储本身,一棵树听起来好像是一个好主意。也许是二叉树,其中每个节点值将是子值的总和,叶节点最终指向文本块,其大小作为其节点值?根节点值将是文本的总大小,每个子树理想地保留文本的一半。你仍然必须自动平衡它,有时需要合并半空的文本块。



如果你错过了,我不是专家在树中: - )



编辑:



显然我建议的是这个数据结构的修改版本:



http:// en。 wikipedia.org/wiki/Rope_%28computer_science%29



在本文中引用:



文本编辑器的数据结构



编辑2:



所提出的数据结构中的删除应该相对较快,因为它会降低数组中的字节移位对样式掩码进行按位操作。插入几乎相同,除非块填满。在每个块中保留一些空间(即样式掩码中的某些位)可能是有意义的,以便将来可以直接在块中插入,而不必为相对较少量的新文本更改树本身。



在这样的块中捆绑字符和样式的另一个优点是,其固有的数据位置应该允许比其他替代方案更有效地使用CPU缓存,从而在一定程度上提高处理速度。



与任何复杂的数据结构一样,您可能需要使用代表性的测试用例或自适应算法进行分析,以确定其操作的最佳参数(块大小,任何预留空间等)。


I'm looking for the best data structure to add styles to a text (say in a text editor). The structure should allow the following operations:

  1. Quick lookup of all styles at absolute position X
  2. Quick insert of text at any position (styles after that position must be moved).
  3. Every position of the text must support an arbitrary number of styles (overlapping).

I've considered lists/arrays which contain text ranges but they don't allow quick insert without recalculating the positions of all styles after the insert point.

A tree structure with relative offsets supports #2 but the tree will degenerate fast when I add lots of styles to the text.

Any other options?

解决方案

I have never developped an editor, but how about this:

I believe it would be possible to expand the scheme that is used to store the text characters themeselves, depending of course on the details of your implementation (language, toolkits etc) and your performance and resource usage requirements.

Rather than use a separate data structure for the styles, I'd prefer having a reference that would accompany each character and point to an array or list with the applicable characters. Characters with the same set of styles could point to the same array or list, so that one could be shared.

Character insertions and deletions would not affect the styles themeselves, apart from changing the number of references to them, which could be handled with a bit of reference counting.

Depending on your programming language you could even compress things a bit more by pointing halfway into a list, although the additional bookkeeping for this might in fact make it more inefficient.

The main issue with this suggestion is the memory usage. In an ASCII editor written in C, bundling a pointer with each char would raise its effective memory usage from 1 byte to 12 bytes on a 64 bit system, due to struct alignment padding.

I would look about breaking the text into small variable size blocks that would allow you to efficiently compress the pointers. E.g. a 32-character block might look like this in C:

struct _BLK_ {
    unsigned char size;
    unsigned int styles;
    char content[];
}

The interesting part is the metadata processing on the variable part of the struct, which contains both the stored text and any style pointers. The size element would indicate the number of characters. The styles integer (hence the 32-character limit) would be seen as a set of 32 1-bit fields, with each one indicating whether a character has its own style pointer, or whether it should use the same style as the previous character. This way a 32-char block with a single style would only have the additional overhead of the size char, the styles mask and a single pointer, along with any padding bytes. Inserting and deleting characters into a small array like this should be quite fast.

As for the text storage itself, a tree sounds like a good idea. Perhaps a binary tree where each node value would be the sum of the children values, with the leaf nodes eventually pointing to text blocks with their size as their node value? The root node value would be the total size of the text, with each subtree ideally holding half of your text. You'd still have to auto-balance it, though, with sometimes having to merge half-empty text blocks.

And in case you missed it, I am no expert in trees :-)

EDIT:

Apparently what I suggested is a modified version of this data structure:

http://en.wikipedia.org/wiki/Rope_%28computer_science%29

as referenced in this post:

Data structure for text editor

EDIT 2:

Deletion in the proposed data structure should be relatively fast, as it would come down to byte shifting in an array and a few bitwise operations on the styles mask. Insertion is pretty much the same, unless a block fills up. It might make sense to reserve some space (i.e. some bits in the styles mask) within each block to allow for future insertions directly in the blocks, without having to alter the tree itself for relatively small amounts of new text.

Another advantage of bundling characters and styles in blocks like this is that its inherent data locality should allow for more efficient use of the CPU cache than other alternatives, thus improving the processing speed to some extent.

Much like any complex data structure, though, you'd probably need either profiling with representative test cases or an adaptive algorithm to determine the optimal parameters for its operation (block size, any reserved space etc).

这篇关于最有效的数据结构为文本添加样式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆