如何有效地存储编辑历史记录? [英] How does one store history of edits effectively?

查看:111
本文介绍了如何有效地存储编辑历史记录?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只是想知道诸如stackoverflow和Wikipedia之类的网站,它们无限期地存储编辑历史,并允许用户回滚编辑。有人能推荐有关使用任何合适技术(例如数据库等)的方法的任何资源/书籍/文章吗?



非常感谢!

解决方案

有很多选项,最简单的方法就是简单地独立记录所有版本。对于像Stack Overflow这样的网站,帖子通常不会被多次编辑,这是适当的。但是,对于像Wikipedia这样的东西,需要更聪明的方式来节省空间。


对于Wikipedia而言,页面最初以每个版本单独存储在文本表。定期将许多较旧的修订版本压缩在一起,然后打包到一个字段中。由于会有很多重复,因此可以节省很多空间。


您可能还想研究某些版本控制系统是如何做到的,例如,subversion使用< a href = http://svn.apache.org/repos/asf/subversion/trunk/notes/skip-deltas rel = nofollow noreferrer>跳过delta ,其中,修订版本与修订历史的一半。这意味着一个人最多只能检查O(lg n)个修订版,以重建一个人的兴趣修订版。


Git使用的方法与Wikipedia的方法更相似。 / p>

首先将修订版本存储为单独压缩的松散对象,然后定期git提取所有松散对象,根据某种复杂的启发式对它们进行排序,然后在附近之间构建压缩的增量对象并将结果转储为 packfile

重建文件所需的修订版本数受打包构建过程的参数限制。在某些情况下,它具有一个有趣的特性,即可以在不相关的对象之间建立增量。


I was just wondering for sites like stackoverflow and wikipedia, they stores history of edits indefinitely and allows user to roll back the edits. Can someone recommend any resources/books/articles regarding how to do this using any suitable technology (such as databases etc)

Thanks a lot!

解决方案

There are a number of options, the simplest, of course, being to simply record all versions independently. For a site like Stack Overflow, where posts aren't usually edited very many times, this is appropriate. However for something like Wikipedia, one needs to be more clever to save space.

In the case of Wikipedia, pages are initially stored with each version separate, in the text table. Periodically, a number of older revisions are compressed together, then packed into a single field. Since there will be a lot of repetition, you save a lot of space this way.

You might also want to look into how some version control systems do it - for example, subversion uses skip deltas, where revisions are stored as a difference from a revision halfway down the history. This means that one will have to examine at most O(lg n) revisions to reconstruct one's revision of interest.

Git, on the other hand, uses something more similar to Wikipedia's approach.

Revisions are stored as individually compressed 'loose' objects at first, then periodically git takes all of the loose objects, sorts them according to a somewhat complex heuristic, then builds compressed deltas between 'nearby' objects and dumps the result as a packfile.
The number of revisions that need to be read to reconstruct a file is bounded by an argument to the pack building process. This has the interesting property that deltas can be built between objects that are unrelated, in some cases.

这篇关于如何有效地存储编辑历史记录?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆