Subversion 会有效地存储 OpenXML Office 文档吗? [英] Will Subversion efficiently store OpenXML Office documents?

查看:21
本文介绍了Subversion 会有效地存储 OpenXML Office 文档吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直将 Subversion 作为我公司的工程文档存储库进行管理.它运行得相当好,但是我有一个关于 Subversion 如何(应该)处理 MS Office 2007 格式的问题.

I have been managing Subversion as an engineering document storage repository for my company. It is working fairly well, however I have a question about how MS Office 2007 formats are (should be) handled by Subversion.

我正在我的工作副本中查看一个 Excel 2007 电子表格(扩展名为 .xlsx),Subversion 已经应用了 svn:mime-type 属性 application/octet-stream.这意味着 Subversion 将其视为二进制文件,对吗?

I'm looking at an Excel 2007 spreadsheet (extension .xlsx) in my working copy that Subversion has applied the svn:mime-type property application/octet-stream. This means that Subversion is treated it as binary, right?

我希望 Subversion 能够有效地存储新的 MS Office 文档格式.我的理解是 binary 文件的完整副本将在该文件的每次提交时制作,而如果文件是 text,则会导致对该文件的小幅更改将少量额外数据添加到存储库中(至少在典型情况下).

I was hoping that the new MS Office document formats would be stored efficiently by Subversion. My understanding is that a full copy of a binary file will be made on every commit of that file, whereas if the file is text, a small change to the file will result in a small amount of additional data being added to the repository (in a typical situation at least).

我不太了解 XML 的很多细节,但我认为 XML 文件是文本,因此 Subversion 可以有效地存储它.

I don't understand much of the details of XML, but I thought that an XML file was text, and that it would therefore be efficiently stored by Subversion.

是否可以配置 Subversion 以便有效地存储 MS Office OpenXML 文档?

Is it possible to configure Subversion so that MS Office OpenXML documents are stored efficiently?

跟进 (2009-11-09):我发现可以使用 Office 2003 XML 文档格式(Excel:XML Spreadsheet 2003)将 Office 文档存储为纯文本; Word:Word XML 文档.有一个关于格式丢失的警告,但我还没有遇到任何明显的格式丢失.

Follow-up (2009-11-09): I've found that Office documents can be stored as plain text using the Office 2003 XML document formats (Excel: XML Spreadsheet 2003; Word: Word XML Document. There is a warning about loss of formatting, but I have yet to encounter any noticeable loss of formatting.

推荐答案

来自 维基百科上的 OpenXML 文章:

Office Open XML 文件是一个ZIP 兼容的 OPC 包包含XML 文档和其他资源.

An Office Open XML file is a ZIP-compatible OPC package containing XML documents and other resources.

换句话说,OpenXML 文件实际上是包含 XML 文件的 zip 文件.压缩或加密会扰乱"数据,破坏 Subversion 在修订之间生成增量的能力.这与 svn:mimetype 无关.Subversion 在生成增量时将所有文件视为二进制文件.

In other words, OpenXML files are actually zip files with XML files in them. Compression or encryption "scrambles" the data, sabotaging subversion's ability to generate deltas between revisions. This is not related to the svn:mimetype. Subversion considers all files to be binary when generating deltas.

在荷兰语中,我们有句谚语测量即知道".下图显示了我在 SVN 1.6 存储库(修订版 1)中导入 500K OpenXML 文档的实验结果.然后我从另一个文档中添加了一个段落,保存并提交.这重复了 5 次(修订版 2 到 6).

In Dutch we have a saying "measuring is knowing". The graph below shows the results of an experiment where I imported a 500K OpenXML document in a SVN 1.6 repository (revision 1). I then added a paragraph from another document, saved and committed. This was repeated 5 times (revision 2 to 6).

如您所见,提交一个仅添加一个段落的新 docx 修订版将花费您大约 150K 的磁盘空间.这比在没有版本控制系统帮助的情况下仅存储每个修订的副本要高效得多.

As you can see, committing a new docx revision that just adds a paragraph will cost you about 150K disk space. This is still much more efficient than just storing a copy of each revision without the help of a version control system.

我还通过解压缩 docx 的每个修订版,使用单独的测试存储库重复了该实验.如您所见,如果不压缩,文档修订的存储效率会更高.有趣的是,subversion 自己的数据压缩与 zip 的效率差不多.在 subversion 中存储未压缩 docx 的第一个修订版所需的空间与原始 docx 大致相同.

I also repeated the experiment with a separate test repository by uncompressing each revision of the docx. As you can see, the storage of the document revisions would be much more efficient if it wasn't compressed. It's also interesting to see that subversion's own data compression is about as efficient as zip. Storing the first revision of an uncompressed docx in subversion takes about the same space as the original docx.

天啊.

这篇关于Subversion 会有效地存储 OpenXML Office 文档吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆