如果将其简单地视为纯文本,则操作转换是否可以在HTML等结构化文档上工作? [英] Does operational transformation work on structured documents such as HTML if simply treated as plain text?

本文介绍了如果将其简单地视为纯文本,则操作转换是否可以在HTML等结构化文档上工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Google Wave协议的常见问题解答指出[HTML]没有理想的属性",并且"HTML使OT(操作转换)变得困难,即使不是不可能的" [1].为什么会这样呢?如果将HTML简单地视为纯文本然后再应用OT,会出现什么问题?

The FAQ of Google Wave Protocol says that [HTML] "does not have desirable properties" and that "HTML makes OT (Operational Transforms) difficult if not impossible" [1]. Why is this so? What problems arise if HTML is treated simply as plain text and then OT applied?

  1. http://www. waveprotocol.org/faq#TOC-What-s-the-XML-schema-for-waves为什么

推荐答案

我假设您了解OT的基础知识.在HTML上以纯文本格式进行OT的主要问题是合并html标签的问题.举一个简单的例子,假设我们有一个如下文档:

I'm assuming here you understand the basics of OT. The principal problem with doing OT on HTML as plain text is that of merging html tags. As a simple example, say we had a document as follows:

Hello world

然后爱丽丝决定世界应该以粗体显示:

Alice then decides that world should be in bold:

Hello <b>world</b>

这可以通过OT中的两次插入操作来表示,如下所示:

This can be represented with a double insert operation in OT, schematically:

Edit A: Keep 6 : Insert "<b>" : Keep 5 : Insert "</b>"

如果鲍勃在看到爱丽丝的修改之前决定世界"为斜体,那么他将添加该操作

If Bob decided that 'world' should be italic before he saw Alice's edit, he would add the operation

Edit B: Keep 6 : Insert "<i>" : Keep 5 : Insert "</i>"

如果服务器在爱丽丝之后收到鲍勃的编辑,则需要将B相对于A转换为B'.

If the server received Bob's edit after Alice's, it would need to transform B against A to become B'.

Keep语句在转换后保持不变,但是在插入"上转换的插入"可以成为保留3:插入"或插入":保留3.通常,服务器将配置为在以后放置新的编辑第一次编辑.

The Keep statements are unchanged through transformation, but Insert "" transformed over Insert "" can become either Keep 3 : Insert "" or Insert "" : Keep 3. Usually the server will be configured to place the later edit after the first edit.

Edit B': Keep 6 : Keep 3 : Insert "<i>" : Keep 5 : Keep 3 : Insert "</i>"

问题在这里变得很明显.将A然后B'应用于原始字符串会产生无效的html:

Here the problem becomes obvious. Applying A then B' to the original string gives the invalid html:

Hello <b><i>world</b></i>

从理论上讲,可以通过更改前后插入来解决此问题,但是对于更复杂的示例而言,这可能会变得冗长,可能需要对每个转换进行完整的文档扫描.

Theoretically this could be solved by varying pre and post inserts, but this would get hairy for more complicated examples, potentially involving a full document scan for every transformation.

另一个答案指出,使用带外注释+纯文本可以避免这种混乱情况.到目前为止,我在学术论文中仅看到的另一种方法是将XML结构视为具有OT操作的树,用于进行节点添加,删除,例如:

As the other answer noted, this mess can be avoided using out-of band annotations + plain text. Another approach I've only seen so far in academic papers is to treat the XML structure as a tree with OT operations for node addition, deletion, eg:

http://citeseerx.ist.psu.edu/viewdoc/summary?doi = 10.1.1.100.74

这篇关于如果将其简单地视为纯文本,则操作转换是否可以在HTML等结构化文档上工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆