编程比较/合并XML文档 [英] Programmatically Diff/Merge Xml Documents

查看:195
本文介绍了编程比较/合并XML文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先,请允许我首先告诉你,在我试图解决这个问题的细节。

我们有一个使用XML文档来存储它的所有业务逻辑和查找表和这样的第三方应用程序。该应用程序有一组基本的XML文件,并使用一种继承模型的揭露,我们是可以编辑自定义业务逻辑的继承XML文件。我说:那种由于恐怖实现继承它使用。

目前,有超过3000单独的XML文件,从1K到5000K和总额约在600MB大小。唯一的好东西,到目前为止,是他们都使用相同的XSD。

我们的问题是,我们收到每月更新核心XML文件,我们应该把他们在的地方,并且提升我们自定义的文件与基础文件的新版本排队。目前,我们正在手动执行此操作,使用DiffDog,并创建新的文件拼凑的,但我试图环绕编程这样做的可能性,我的头。让我看看,如果我可以种想象这个要求:

我们开始与那种像这样下方的结构,到位的基础模板,自定义模板,我们可以定义我们的自定义规则(我们做了很多)


.. \\ LineOfBusiness \\ BaseTemplates \\ BaseXml_1_0_0_0.xml

.. \\ LineOfBusiness \\ CustomTemplates \\ Document_1_0_0_0.xml

我们会再给予每个月升级,现在我们有这样的结构:


.. \\ LineOfBusiness \\ BaseTemplates \\ BaseXml_1_0_0_0.xml

.. \\ LineOfBusiness \\ BaseTemplates \\ BaseXml_1_1_0_0.xml

.. \\ LineOfBusiness \\ CustomTemplates \\ Document_1_0_0_0.xml

我们的工作本质上是创建


.. \\ LineOfBusiness \\ CustomTemplates \\ Document_1_1_0_0.xml

每个月都会记录自己,使我们在previous版本所做的更改,到新版本的逻辑。

我知道这个系统是可笑的,但今天我不能改变。如何解决这个问题的任何想法将是巨大的。我可以告诉你我所想的,到目前为止...


  1. 反序列化基地和自定义旧版本文件,以获得的具体差异列表时,应用到新基地的反序列化的版本,这些差异和应用的差异来,然后reserialize到XML。


  2. 应用某种形式的注释过程到自定义模板,这样我们就可以在升级时以编程方式提取的差异。


  3. 外包升级过程...



解决方案

如果您使用的是.NET语言,你也许可以完成你的努力用的微软的XML diff和patch 工具/库。

我用它来正确识别,有不同的XML片段之间的变化。这是我们的,因为我们有在磁盘上存储在SQL Server中的XML列,因为要被删除不重要的空白之后会有所不同的XML方案的重要,和/或重新安排属性(的信息集)。只是比较文本斑点总会发现有差别,在实际中的 XML 的元素/值是一样的。

我没有使用工具的能力修补,只有的xmldiff。

有在市场上几个漂亮的商业XML比较工具,但我不知道有什么,提供了一个code或脚本API。这将是一个不错的功能为增值!

First, let me begin by telling you the details on the problem I'm trying to solve.

We have a third party application that uses Xml Documents to store all of it's business logic and look up tables and such. The application has a base set of Xml Files, and uses a kind of inheritance model to expose inherited Xml files that we're to edit to customize the business logic. I say "Kind of" due to the horrible implementation of inheritance it uses.

Currently there are over 3000 seperate Xml files ranging from 1k to 5000k and totaling about 600MB in size. The only good thing so far, is that they all use the same Xsd.

Our problem is, we receive monthly updates to the core Xml files, and we're supposed to put them in place, and upgrade our custom documents to line up with the new version of the base documents. We're currently doing this manually, using DiffDog, and piecing together the documents to create new ones, but I'm trying to wrap my head around the possibility of doing this programmatically. Let me see if I can kind of visualize this for you:

We start off with a structure kind of like this below, with the base template in place, and a custom template that we can define our custom rules in (Which we do a lot)

..\LineOfBusiness\BaseTemplates\BaseXml_1_0_0_0.xml
..\LineOfBusiness\CustomTemplates\Document_1_0_0_0.xml

We're then given an upgrade each month so now we have a structure like this:

..\LineOfBusiness\BaseTemplates\BaseXml_1_0_0_0.xml
..\LineOfBusiness\BaseTemplates\BaseXml_1_1_0_0.xml
..\LineOfBusiness\CustomTemplates\Document_1_0_0_0.xml

Our job essentially is to create the

..\LineOfBusiness\CustomTemplates\Document_1_1_0_0.xml

document ourselves every month, bringing the changes we made in the previous version, into the new versions logic.

I know this system is ridiculous, but I can't change that today. Any ideas on how to tackle this problem would be great. I can tell you what I've thought of so far...

  1. Deserialize the Base and Custom old version documents to get a list of specific differences, the apply those differences to a deserialized version of the new Base and apply the differences to it, then reserialize to xml.

  2. Apply some sort of annotation process to the Custom Templates, so that we can extract the differences programmatically at upgrade time.

  3. Outsource the upgrade process...

解决方案

If your using a .NET language, you might be able to accomplish what your trying to do with Microsoft's XML Diff and Patch tool/library.

I've used it to correctly identify that there were changes between different xml fragments. This was important for our scenario as the XML we had on disk would differ after being stored in a Sql Server XML column because of insignificant whitespace being removed, and/or re-arranging attributes (Infoset). Just comparing the text blobs would always detect a difference, when actually the XML elements/values were the same.

I've not used the patching ability of the tool, only XmlDiff.

There are several nice commercial XML diff tools on the market, but I don't know of any that provide a code, or scripting, API. That would be a nice feature for value add!

这篇关于编程比较/合并XML文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆