使用模式按照模式重新排序XML文档的元素 [英] Using a schema to reorder the elements of an XML document in conformance with the schema

查看:113
本文介绍了使用模式按照模式重新排序XML文档的元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个XML文档(表示为文本,W3C DOM,无论如何),还有一个XML Schema。 XML文档具有模式定义的所有正确元素,但顺序错误。



如何使用模式重新排序元素文档是否符合模式定义的顺序?



我知道这应该是可能的,可能使用 XSOM ,因为JAXB XJC代码生成器使用元素的正确序列化顺序对其生成的类进行注释。



但是,我不熟悉XSOM API,它非常密集,所以我希望你们中的一个人有一些经验,可以指出我正确的方向。类似在这个父元素中允许哪些子元素,以及以什么顺序?






让我举一个例子。



我有一个这样的XML文档:

 < A> 
< Y />
< X />
< / A>

我有一个XML Schema,它说< A>的内容必须是< X> ,然后是< Y> 。现在显然,如果我尝试根据模式验证文档,它会失败,因为< X> < Y> 订单错误。但我知道我的文档提前是错误的,所以我还没有使用模式进行验证。但是,我知道我的文档具有模式定义的所有正确元素,只是顺序错误。



我是什么我想做的是以编程方式检查Schema(可能使用XSOM - 这是XML Schema的对象模型),并询问< A> 的内容应该是什么。 API将公开您需要< X> 后跟< Y> 的信息。



所以我使用我的XML文档(使用DOM API)并相应地重新安排,以便现在文档将根据模式进行验证。



了解XSOM在这里的重要性 - 它是一个java API,它代表XML Schema中包含的信息,我的实例文档中包含的信息。 / p>

我不想做的是从架构生成代码,因为架构在构建时是未知的。此外,XSLT没有用,因为元素的正确排序仅由模式中包含的数据字典决定。



希望现在已经足够明确了。

解决方案

您的问题转化为:您有一个与架构不匹配的XSM文件,并且您希望将其转换为有效的内容。



使用XSOM,您可以读取XSD中的结构并可能分析XML,但仍需要从无效表单到有效表单的其他映射。使用样式表会更容易,因为您将遍历XML,使用XPath节点以正确的顺序处理元素。使用XML在梨之前需要苹果,样式表将首先复制苹果节点(/ Fruit / Apple),然后复制pear节点。这样,无论旧文件中的顺序如何,它们在新文件中的顺序都是正确的。



使用XSOM可以做的是阅读XSD并生成将重新排序数据的样式表。然后使用该样式表转换XML。一旦XSOM为XSD生成了样式表,您就可以重新使用样式表,直到修改XSD或需要其他XSD。



当然,您可以使用XSOM以正确的顺序立即复制节点。但是,由于这意味着您的代码必须遍历所有节点和子节点,因此可能需要一些时间来完成处理。样式表也会这样做,但变换器将能够更快地处理它。它可以直接处理数据,而Java代码必须通过XMLDocument属性获取/设置每个节点。


因此,我会使用XSOM为XSD生成一个样式表,它只是按节点复制XML节点一遍又一遍地重复使用。只有在XSD更改时才需要重写样式表,并且它的执行速度比Java API需要遍历节点本身时要快。样式表不关心顺序,因此它总是以正确的顺序结束。
为了使它更有趣,你可以跳过XSOM并尝试使用一个样式表来读取XSD来生成另一个样式表。它。生成的样式表将按照样式表中定义的确切顺序复制XML节点。它会很复杂吗?实际上,样式表需要为每个元素生成模板,并确保以正确的顺序处理此元素中的子元素。



当我想到这一点时,我想知道这是否已经完成。它非常通用,几乎可以处理每个XSD / XML。



让我们看看...使用// xsd:element / @ name你会获取架构中的所有元素名称。每个唯一名称都需要转换为模板。在这些模板中,您需要处理特定元素的子节点,这稍微复杂一些。元素可以有一个引用,您需要遵循它。否则,获取所有子xsd:element节点。


Say I have an XML document (represented as text, a W3C DOM, whatever), and also an XML Schema. The XML document has all the right elements as defined by the schema, but in the wrong order.

How do I use the schema to "re-order" the elements in the document to conform to the ordering defined by the schema?

I know that this should be possible, probably using XSOM, since the JAXB XJC code generator annotates its generated classes with the correct serialization order of the elements.

However, I'm not familiar with the XSOM API, and it's pretty dense, so I'm hoping one of you lot has some experience with it, and can point me in the right direction. Something like "what child elements are permitted inside this parent element, and in what order?"


Let me give an example.

I have an XML document like this:

<A>
   <Y/>
   <X/>
</A>

I have an XML Schema which says that the contents of <A> must be an <X> followed by a <Y>. Now clearly, if I try to validate the document against the schema, it fails, since the <X> and <Y> are in the wrong order. But I know my document is "wrong" in advance, so I'm not using the schema to validate just yet. However, I do know that my document has all of the correct elements as defined by the schema, just in the wrong order.

What I want to do is to programmatically examine the Schema (probably using XSOM - which is an object model for XML Schema), and ask it what the contents of <A> should be. The API will expose the information that "you need an <X> followed by a <Y>".

So I take my XML document (using a DOM API) and re-arrange and accordingly, so that now the document will validate against the schema.

It's important to understand what XSOM is here - it's a java API which represents the information contained in an XML Schema, not the information contained in my instance document.

What I don't want to do is generate code from the schema, since the schema is unknown at build time. Furthermore, XSLT is no use, since the correct ordering of the elements is determined solely by the data dictionary contained in the schema.

Hopefully that's now explicit enough.

解决方案

Your problem translates to this: you have an XSM file that doesn't match the schema and you want to transform it to something that's valid.

With XSOM, you can read the structure in the XSD and perhaps analyze the XML but it still would need additional mapping from the invalid form to the valid form. The use of a stylesheet would be much easier, because you would walk through the XML, using XPath nodes to handle the elements in the proper order. With an XML where you want apples before pears, the stylesheet would first copy the apple node (/Fruit/Apple) before it copies the pear node. That way, no matter of the order in the old file, they would be in the correct order in the new file.

What you could do with XSOM is to read the XSD and generate the stylesheet that will re-order the data. Then transform the XML using that stylesheet. once XSOM has generated a stylesheet for the XSD, you can just re-use the stylesheet until the XSD is modified or another XSD is needed.

Of course, you could use XSOM to copy nodes immediately in the right order. But since this means your code has to walk itself through all nodes and child nodes, it might take some time to process to finish. A stylesheet would do the same, but the transformer will be able to process it all faster. It can work directly on the data while the Java code would have to get/set every node through the XMLDocument properties.


So, I would use XSOM to generate a stylesheet for the XSD which would just copy the XML node by node to re-use over and over again. The stylesheet would only need to be rewritten when the XSD changes and it would perform faster than when the Java API needs to walk through the nodes itself. The stylesheet doesn't care about order so it would always end up in the right order.
To make it more interesting, you could just skip XSOM and try to work with a stylesheet that reads the XSD to generate another stylesheet from it. This generated stylesheet would be copying the XML nodes in the exact order as defined in the stylesheet. Would it be complex? Actually, the stylesheet would need to generate templates for every element and make sure the child elements in this element are processed in the correct order.

When I think about this, I wonder if this has been done before already. It would be very generic and would be able to handle almost every XSD/XML.

Let's see... Using "//xsd:element/@name" you would get all element names in the schema. Every unique name would need to be translated to a template. Within these templates, you would need to process the child nodes of the specific element, which is slightly more complex to get. Elements can have a reference, which you would need to follow. Otherwise, get all child xsd:element nodes it.

这篇关于使用模式按照模式重新排序XML文档的元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆