版本控制历史如何存储和计算? [英] How are version control histories stored and calculated?

查看:117
本文介绍了版本控制历史如何存储和计算?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑这个简单的python代码,它演示了一个非常简单的dictonary版本控制设计:

  def build_current(history) :
current = {}
for action,key,value in history:
assert action in('set','del')
if action =='set':
current [key] = value
elif action =='del':
del current [key]
return current

history = []
history.append(('set','1','one'))
history.append(('set','2','two'))
history.append (('set','3','three'))
print build_current(history)
history.append(('del','2',None))
history。 append(('set','1','uno'))
history.append(('set','4','four'))
print build_current(history)
为行动,键,历史价值:
if key =='2':
print'(%s,%s,%s)'%(action,key,value)
code>

请注意,通过使用历史记录列表,您可以以任何曾经存在的状态重建当前字典。我认为这是一个前向构建(因为缺乏更好的术语),因为构建当前字典必须从头开始并处理整个历史列表。我认为这是最明显和直接的方法。

据我所知,早期的版本控制系统使用这种前期构建的过程,但它们并不是最优的,因为大多数用户更关心构建的最新版本。此外,用户不希望下载整个历史时,他们只关心看到最新的构建。



然后我的问题是,还有什么其他方法存储历史记录版本控制系统?也许可以使用倒退建造?这可能允许用户只下载最近修订版而不需要整个历史记录。我还看到几种不同的格式来存储历史记录,即:变更集,快照和补丁。变更集,快照和补丁之间有什么区别?



在现有的流行版本控件中,他们如何存储历史以及各种设计的优点?

解决方案

您提到了这3种存储方法(文件) - 历史:


  1. 补丁:一个补丁是(通常是文本的,但二进制补丁也是可能的)表示两个文件之间的区别。它是unix命令 diff 的输出,可以通过unix命令 patch 应用。很多版本控制系统使用补丁来存储文件的历史记录(例如,SVN,CVS,GIT ..)。有时这些补丁在技术上被称为德尔塔,如希腊字母Δ描述了两件事的区别。
  2. changeset :变更集是一个术语,用于将属于一起的变更组合到一个实体中的不同文件中。并非所有版本控制系统都支持更改集(最值得注意的CVS和SourceSafe)。开发人员正在使用变更集来避免构建破损(例如:在一个文件中更改方法的签名,在第二个文件中更改调用,您需要同时执行两个更改才能运行该程序,否则会出现错误)。
    另请参阅此处了解变更集和修补程序之间的区别

  3. 快照:是此文件/文件系统状态的完整副本。它们通常很大,它们的使用取决于性能特征。快照对于一系列补丁总是多余的,但是要更快地检索信息,有时版本控制系统会混合或合并补丁和快照。

Subversion在FSFS存储库中使用前向增量(aka修补程序),在BDB存储库中使用后向增量
请注意,这些实现具有不同的性能特征:


  • forward deltas提交速度很快,但签出速度却很慢因为
    当前版本必须重建)

  • 向后的增量快速检出,但是缓慢的提交为新的
    deltas必须构造成构造新的电流并将前一个电流重写为一堆增量




另请注意, FSFS使用跳过三角洲算法,该算法可最小化跳转来重建一个特定的版本。但是,这种跳过增量并不像mercurials快照那样大小优化;它只是最小化构建完整版本所需的修订数量,而不管整体大小如何。



这里有一个小的ascii艺术(从规范复制)带有9个修订版的文件:

  0 < -  1 2 < -  3 4 <-5 6 < -  7 
0< ------ 2 4< ------ 6
0< ---------------- 4
0 <------------------------------------ 8 < - 9

其中0 < - 1表示版本1的增量基数为修订版0。



对于N版本,跳转次数最多为log(N)。



对FSFS的影响也非常大,将只写一次,在此之后,他们只能通过进一步的操作进行阅读。
这就是为什么Subversion版本库非常稳定:只要硬盘上没有硬件故障,即使在上次提交时发生了一些损坏,您也应该能够获得工作存储库:您仍然拥有所有旧版本修订。


在BDB Backend中,您不断重写checkins / commits上的当前修订版本,这使得此过程容易发生数据损坏。另外,由于您仅将全文存储在当前版本中,因此在提交时破坏数据可能会破坏历史的大部分内容。


Consider this simple python code, which demonstrates a very simple version control design for a dictonary:

def build_current(history):
    current = {}
    for action, key, value in history:
        assert action in ('set', 'del')
        if action == 'set':
            current[key] = value
        elif action == 'del':
            del current[key]
    return current

history = []
history.append(('set', '1', 'one'))
history.append(('set', '2', 'two'))
history.append(('set', '3', 'three'))
print build_current(history)
history.append(('del', '2', None))
history.append(('set', '1', 'uno'))
history.append(('set', '4', 'four'))
print build_current(history)
for action, key, value in history:
    if key == '2':
        print '(%s, %s, %s)' % (action, key, value)

Notice that by using the history list you can reconstruct the current dictionary in any state it once existed. I consider this a "forward build" (for lack of a better term) because to build the current dictionary one must start at the beginning and process the entire history list. I consider this the most obvious and straight forward approach.

As I've heard, early version control systems used this "forward build" process, but they were not optimal because most users care more about recent versions of a build. Also, users don't want to download the entire history when they only care about seeing the latest build.

My question then is, what other approaches exist for storing histories in a version control system? Perhaps a "backwards build" could be used? That might allow users to only download recent revisions without needing the entire history. I've also seen a few different formats for storing the history, namely: changesets, snapshots, and patches. What are the differences between changesets, snapshots and patches?

Of the modern popular version controls available, how do they store their histories and what are the advantages of their various designs?

解决方案

You mentioned these 3 methods of storing (file)-history:

  1. patch : a patch is the (usually textual, but binary patches are also possible) representation of the difference between two files. It is the output of unix command diff and can be applied by unix command patch. A lot of versioning systems are using patches to store the history of files (eg. SVN, CVS, GIT..). Sometimes these patches are technically called "delta" as the greek letter "Δ" describing the difference of two things.
  2. changeset: a changeset is a term to combine changes "which belonging together" to different files in a single entity. Not all versioning systems support changesets (most notable CVS and SourceSafe). Developer are using changesets to avoid broken builds(example: change the signature of a method in one file, change the call in a second file. You need to have both changes in place to run the program, otherwise you get an error). See also here for the difference between changeset and patch.
  3. snapshots: are full copies of the state of this file/filesystem to this point of time. They are usually quite large and their usage depends on performance characteristics. The snapshot is always redundant to a list of patches, however to retrieve information faster, sometimes Versioning Systems mix or combine patches and snapshots

Subversion uses forward deltas(aka patches) in FSFS repositories and backward deltas in BDB Repositories. Note that these implementations have different performance characteristics:

  • forward deltas are fast in committing, but slow on checkouts(as the "current" version must be reconstructed)

  • backward deltas are fast in checking out but slow on commit as new deltas must be constructed to construct the new current and rewrite the previous "current" as a bunch of deltas

Also note that FSFS uses a "skipping delta" algorithm which minimizes the jumps to reconstruct a specific version. This skipping delta however is not size optimized as mercurials snapshots; it just minimizes the number of "revisions" you need to build a full version regardless of the overall size.

Here a small ascii art (copied from the specification) of a file with 9 revisions:

0 <- 1    2 <- 3    4 <- 5    6 <- 7
0 <------ 2         4 <------ 6
0 <---------------- 4
0 <------------------------------------ 8 <- 9

where "0 <- 1" means that the delta base for revision 1 is revision 0.

The number of jumps is at most log(N) for N revisions.

Also a very pretty effect on FSFS is that older revision will be written only once and after this they will be only read by further actions. That's why subversion repositories are very stable: as long as you do not have a HW failure on your harddisk, you should be able to get a working repository even if some corruption occurred in the last commit: You still have all older revisions.

In BDB Backend you have constantly rewrite the current revision on checkins/commits, which makes this process prone to data corruption. Also as you store the full text only in current revision, corrupting the data on commit will likely destroy great parts of your history.

这篇关于版本控制历史如何存储和计算?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆