在数据库中存储差异的最紧凑的方法是什么? [英] What's the most compact way to store diffs in a database?

查看:114
本文介绍了在数据库中存储差异的最紧凑的方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想实现类似于Wikimedia修订历史的内容吗?最好使用什么PHP函数/库/扩展/算法?

I want to implement something similar to Wikimedia's revision history? What would be the best PHP functions/libraries/extensions/algorithms to use?

我希望差异尽可能小,但我很高兴只能显示每个修订及其同级之间的差异,并且一次只能回滚一个修订.

I would like the diffs to be as compact as possible, but I'm happy to be restricted to only showing the difference between each revision and its sibling, and only being able to roll back one revision at a time.

在某些情况下,可能只有少数几个字符会发生变化,而在其他情况下,整个字符串可能会发生变化,因此,我渴望了解某些技术相对于较大的变化是否比较大的变化更好,如果在某些情况下,它的变化更大仅存储整个副本的效率很高.

In some cases only a few characters may change, whereas in other cases the whole string could change, so I'm keen to understand whether some techniques are better for small changes than for large ones, and if in some cases it's more efficient to simply store whole copies.

使用Git或SVN之类的东西支持整个系统似乎有些极端,我真的不想在磁盘上存储文件.

Backing the whole system with something like Git or SVN seems a bit extreme, and I don't really want to store files on disk.

推荐答案

完整存储每个记录比存储它们的差异要容易得多.然后,如果您需要两个修订的差异,则可以使用 PECL Text_Diff.

It is much easier to store each record in its entirety than it is to store diffs of them. Then if you want a diff of two revisions you can generate one as needed using the PECL Text_Diff library.

我喜欢将记录的所有版本存储在一个表中,并使用MAX(revision),当前"布尔属性或类似属性检索最新的记录.其他人则喜欢非规范化,并拥有一个包含非当前修订版的镜像表.

I like to store all versions of the record in a single table and retrieve the most recent one with MAX(revision), a "current" boolean attribute, or similar. Others prefer to denormalize and have a mirror table that holds non-current revisions.

如果您存储差异,则您的架构和算法将变得更加复杂.然后,您需要存储至少一个完整"修订版和多个差异"版本,并在需要完整版本时从一组差异中重建完整版本. (这是SVN存储内容的方式.Git存储每个修订的完整副本,而不是差异.)

If you store diffs instead, your schema and algorithms become much more complex. You then need to store at least one "full" revision and multiple "diff" versions, and reconstruct a full version from a set of diffs whenever you need a full version. (This is how SVN stores things. Git stores a full copy of each revision, not diffs.)

程序员的时间很昂贵,但是磁盘空间通常很便宜.请考虑是否完整存储每个修订确实存在问题.

Programmer time is expensive, but disk space is usually cheap. Please consider whether storing each revision in full is really a problem.

这篇关于在数据库中存储差异的最紧凑的方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆