Subversion是否可以有效地存储OpenXML Office文档? [英] Will Subversion efficiently store OpenXML Office documents?

查看:99
本文介绍了Subversion是否可以有效地存储OpenXML Office文档?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在将Subversion作为公司的工程文档存储库进行管理.它运行良好,但是我对Subversion如何处理MS Office 2007格式有疑问.

I have been managing Subversion as an engineering document storage repository for my company. It is working fairly well, however I have a question about how MS Office 2007 formats are (should be) handled by Subversion.

我正在工作副本中查看一个Excel 2007电子表格(扩展名为.xlsx),其中Subversion已应用 svn:mime-type 属性 application/octet-stream .这意味着Subversion将其视为二进制文件,对吧?

I'm looking at an Excel 2007 spreadsheet (extension .xlsx) in my working copy that Subversion has applied the svn:mime-type property application/octet-stream. This means that Subversion is treated it as binary, right?

我希望Subversion可以有效地存储新的MS Office文档格式.我的理解是,每次提交 binary 文件都将复制该文件的完整副本,而如果文件是 text ,则对该文件进行很小的更改少量的其他数据被添加到存储库中(至少在典型情况下).

I was hoping that the new MS Office document formats would be stored efficiently by Subversion. My understanding is that a full copy of a binary file will be made on every commit of that file, whereas if the file is text, a small change to the file will result in a small amount of additional data being added to the repository (in a typical situation at least).

我不太了解XML的详细信息,但是我认为XML文件是文本,因此可以通过Subversion有效地存储.

I don't understand much of the details of XML, but I thought that an XML file was text, and that it would therefore be efficiently stored by Subversion.

是否可以配置Subversion,以便有效地存储MS Office OpenXML文档?

Is it possible to configure Subversion so that MS Office OpenXML documents are stored efficiently?

后续(2009-11-09):我发现可以使用Office 2003 XML文档格式将Office文档存储为纯文本格式(Excel: XML Spreadsheet 2003 ; Word: Word XML文档.关于格式丢失的警告,但我还没有遇到任何明显的格式丢失.

Follow-up (2009-11-09): I've found that Office documents can be stored as plain text using the Office 2003 XML document formats (Excel: XML Spreadsheet 2003; Word: Word XML Document. There is a warning about loss of formatting, but I have yet to encounter any noticeable loss of formatting.

推荐答案

来自有关Wikipedia的OpenXML文章:

Office Open XML文件是 兼容ZIP的OPC软件包,其中包含 XML文档和其他资源.

An Office Open XML file is a ZIP-compatible OPC package containing XML documents and other resources.

换句话说,OpenXML文件实际上是其中包含XML文件的zip文件.压缩或加密会扰乱"数据,破坏Subversion在版本之间生成增量的能力.这与svn:mimetype无关.生成增量时,Subversion认为所有文件都是二进制文件.

In other words, OpenXML files are actually zip files with XML files in them. Compression or encryption "scrambles" the data, sabotaging subversion's ability to generate deltas between revisions. This is not related to the svn:mimetype. Subversion considers all files to be binary when generating deltas.

在荷兰语中,我们有一句测量就是知道".下图显示了一个实验的结果,其中我在SVN 1.6存储库(修订版1)中导入了500K OpenXML文档.然后,我从另一个文档中添加了一个段落,保存并提交.重复了5次(修订2至6).

In Dutch we have a saying "measuring is knowing". The graph below shows the results of an experiment where I imported a 500K OpenXML document in a SVN 1.6 repository (revision 1). I then added a paragraph from another document, saved and committed. This was repeated 5 times (revision 2 to 6).

如您所见,提交仅添加一个段落的新docx修订版将花费您大约15万个磁盘空间.与仅在没有版本控制系统的帮助下存储每个修订的副本相比,这仍然有效得多.

As you can see, committing a new docx revision that just adds a paragraph will cost you about 150K disk space. This is still much more efficient than just storing a copy of each revision without the help of a version control system.

我还通过解压缩docx的每个修订版,使用单独的测试存储库重复了该实验.如您所见,如果不进行压缩,文档修订版本的存储将更加高效.有趣的是, subversion自己的数据压缩与zip一样有效.在Subversion中存储未压缩docx的第一个修订版所占用的空间与原始docx大约相同.

I also repeated the experiment with a separate test repository by uncompressing each revision of the docx. As you can see, the storage of the document revisions would be much more efficient if it wasn't compressed. It's also interesting to see that subversion's own data compression is about as efficient as zip. Storing the first revision of an uncompressed docx in subversion takes about the same space as the original docx.

YMMV.

这篇关于Subversion是否可以有效地存储OpenXML Office文档?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆