使用 XSD 验证 XML ...但仍允许可扩展性 [英] Validating XML with XSDs ... but still allow extensibility

查看:27
本文介绍了使用 XSD 验证 XML ...但仍允许可扩展性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

也许是我,但看起来如果你有 XSD

Maybe it's me, but it appears that if you have an XSD

<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xs:element name="User">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="GivenName" />
                <xs:element name="SurName" />
            </xs:sequence>
            <xs:attribute name="ID" type="xs:unsignedByte" use="required" />
        </xs:complexType>
    </xs:element>
</xs:schema>

定义此文档的架构

<?xml version="1.0" encoding="utf-8" ?>
<User ID="1">
    <GivenName></GivenName>
    <SurName></SurName>
</User>

如果你添加了另一个元素,比如 EmailAddress,并且混淆了顺序,它会无法验证

It would fail to validate if you added another element, say EmailAddress, and mix up the order

<?xml version="1.0" encoding="utf-8" ?>
<User ID="1">
    <SurName></SurName>
    <EmailAddress></EmailAddress>
    <GivenName></GivenName>
</User>

我不想将 EmailAddress 添加到文档并将其标记为可选.

I don't want to add EmailAddress to the document and have it be marked optional.

我只想要一个 XSD 来验证文档必须满足的最低要求.

I just want an XSD that validates the bare minimum requirements that the document must meet.

有没有办法做到这一点?

Is there a way to do this?

marc_s 在下面指出,您可以在 xs:sequence 内部使用 xs:any 来允许更多元素,不幸的是,您必须保持元素的顺序.

marc_s pointed out below that you can use xs:any inside of xs:sequence to allow more elements, unfortunately, you have to maintain the order of elements.

或者,我可以使用 xs:all 它不强制执行元素的顺序,但是唉,不允许我将 xs:any 放在里面

Alternatively, I can use xs:all which doesn't enforce the order of elements, but alas, doesn't allow me to place xs:any inside of it.

推荐答案

你的问题有一个解决方案,但它不会很好.原因如下:

Your issue has a resolution, but it will not be pretty. Here's why:

您已经触及了 W3C XML Schema 的灵魂.你在问什么—可变顺序可变未知元素—违反了 XSD 最难但最基本的原则,即非歧义规则,或者更正式地说,独特的粒子属性约束:

You've touched on the very soul of W3C XML Schema's. What you are asking — variable order and variable unknown elements — violates the hardest, yet most basic principle of XSD's, the rule of Non-Ambiguity, or, more formally, the Unique Particle Attribution Constraint:

必须形成一个内容模型,例如在验证期间 [..] 每个项目在序列中可以是唯一的未经审查就确定该项目的内容或属性,并且没有任何关于剩余部分的项目顺序.

在普通英语中:当 XML 被验证并且 XSD 处理器遇到 <SurName> 时,它必须能够验证它而无需首先检查它后面是否跟有 ;,即没有期待.在您的情况下,这是不可能的.这条规则的存在是为了允许通过有限状态机实现,这应该会使实现变得相当简单和快速.

In normal English: when an XML is validated and the XSD processor encounters <SurName> it must be able to validate it without first checking whether it is followed by <GivenName>, i.e., no looking forward. In your scenario, this is not possible. This rule exists to allow implementations through Finite State Machines, which should make implementations rather trivial and fast.

这是最有争议的问题之一,是 SGML 和 DTD(内容模型必须是确定性的)和 XML 的遗产,默认情况下,它定义了元素的顺序很重要(因此,尝试相反的方法,使顺序变得不重要,很难).

This is one of the most-debated issues and is a heritage of SGML and DTD (content models must be deterministic) and XML, that defines, by default, that the order of elements is important (thus, trying the opposite, making the order unimportant, is hard).

正如 Marc_s 已经建议的那样,Relax_NG 是一种允许非确定性内容模型的替代方案.但是,如果您坚持使用 W3C XML Schema,您该怎么办?

As Marc_s already suggested, Relax_NG is an alternative that allows for non-deterministic content models. But what can you do if you're stuck with W3C XML Schema?

您已经注意到 xs:all 非常严格.原因很简单:同样的非确定性规则适用,这就是为什么 xs:any, min/maxOccurs 大于 1 并且不允许序列.

You've already noticed that xs:all is very restrictive. The reason is simple: the same non-deterministic rule applies and that's why xs:any, min/maxOccurs larger then one and sequences are not allowed.

此外,您可能已经尝试过choicesequenceany 的各种组合.Microsoft XSD 处理器遇到这种无效情况时抛出的错误是:

Also, you may have tried all sorts of combinations of choice, sequence and any. The error that the Microsoft XSD processor throws when encountering such invalid situation is:

错误:元素的多重定义'http://example.com/Chad:SurName'导致内容模型变成模糊的.内容模型必须是形成这样的,在验证过程中一个元素信息项序列,直接包含的粒子,其中间接或隐含地尝试验证每个项目在序列中依次可以是唯一确定,无需检查其内容或属性项目,并且没有任何信息关于剩下的项目顺序.

Error: Multiple definition of element 'http://example.com/Chad:SurName' causes the content model to become ambiguous. A content model must be formed such that during validation of an element information item sequence, the particle contained directly, indirectly or implicitly therein with which to attempt to validate each item in the sequence in turn can be uniquely determined without examining the content or attributes of that item, and without any information about the items in the remainder of the sequence.

O'Reilly 的 XML 架构(是的,这本书有它的缺陷)这很好地解释了.不幸的是,本书的部分内容可在线获取.我强烈建议您通读 第 7.4.1.3 节关于唯一粒子归因规则,他们的解释和例子比我能得到的要清楚得多.

In O'Reilly's XML Schema (yes, the book has its flaws) this is excellently explained. Furtunately, parts of the book are available online. I highly recommend you read through section 7.4.1.3 about the Unique Particle Attribution Rule, their explanations and examples are much clearer than I can ever get them.

在大多数情况下,可以从不确定性设计转变为确定性设计.这通常看起来并不漂亮,但如果您必须坚持使用 W3C XML Schema 和/或如果您绝对必须允许对 XML 使用非严格规则,那么这是一个解决方案.你的情况的噩梦是你想要强制执行一件事(2 个预定义的元素),同时又想让它非常松散(顺序无关紧要 任何事情都可以在之前和之前后).如果我不尝试给您很好的建议,而是直接带您找到解决方案,它会如下所示:

In most cases it is possible to go from an undeterministic design to a deterministic design. This usually doesn't look pretty, but it's a solution if you have to stick with W3C XML Schema and/or if you absolutely must allow non-strict rules to your XML. The nightmare with your situation is that you want to enforce one thing (2 predefined elements) and at the same time want to have it very loose (order doesn't matter and anything can go between, before and after). If I don't try to give you good advice but just take you directly to a solution, it will look as follows:

<xs:element name="User">
    <xs:complexType>
        <xs:sequence>
            <xs:any minOccurs="0" processContents="lax" namespace="##other" />
            <xs:choice>
                <xs:sequence>                        
                    <xs:element name="GivenName" />
                    <xs:any minOccurs="0" processContents="lax" namespace="##other" />
                    <xs:element name="SurName" />
                </xs:sequence>
                <xs:sequence>
                    <xs:element name="SurName" />
                    <xs:any minOccurs="0" processContents="lax" namespace="##other" />
                    <xs:element name="GivenName" />
                </xs:sequence>
            </xs:choice>
            <xs:any minOccurs="0" processContents="lax" namespace="##any" />
        </xs:sequence>
        <xs:attribute name="ID" type="xs:unsignedByte" use="required" />
    </xs:complexType>
</xs:element>

上面的代码实际上正常工作.但有一些注意事项.第一个是 xs:any,以 ##other 作为其命名空间.除了最后一个之外,您不能使用 ##any,因为这将允许使用 GivenName 之类的元素代替,这意味着 的定义用户变得模棱两可.

The code above actually just works. But there are a few caveats. The first is xs:any with ##other as its namespace. You cannot use ##any, except for the last one, because that would allow elements like GivenName to be used in that stead and that means that the definition of User becomes ambiguous.

第二个警告是,如果您想将这个技巧用于两个或三个以上,您必须写下所有组合.维护噩梦.这就是为什么我想出了以下内容:

The second caveat is that if you want to use this trick with more than two or three, you'll have to write down all combinations. A maintenance nightmare. That's why I come up with the following:

改变你的定义.这样做的好处是让您的读者或用户更清楚.它还具有变得更易于维护的优点.一整套解决方案在 XFront 上进行了解释,您可能已经看到了一个不太可读的链接来自 Oleg 的帖子.这是一篇出色的读物,但其中大部分内容没有考虑到您对可变内容容器内的两个元素的最低要求.

Change your definition. This has the advantage of being clearer to your readers or users. It also has the advantage of becoming easier to maintain. A whole string of solutions are explained on XFront here, a less readable link you may have already seen from the post from Oleg. It's an excellent read, but most of it does not take into account that you have a minimum requirement of two elements inside the variable content container.

针对您的情况(这种情况发生的频率比您想象的要多)的当前最佳做法是将数据拆分为必填字段和非必填字段.你可以添加一个元素 ,或者做相反的事情,添加一个元素 (或者称之为 Properties,或者可选数据).如下所示:

The current best-practice approach for your situation (which happens more often than you may imagine) is to split your data between the required and non-required fields. You can add an element <Required>, or do the opposite, add an element <ExtendedInfo> (or call it Properties, or OptionalData). This looks as follows:

<xs:element name="User2">
    <xs:complexType>
        <xs:sequence>
            <xs:element name="GivenName" />
            <xs:element name="SurName" />
            <xs:element name="ExtendedInfo" minOccurs="0">
                <xs:complexType>
                    <xs:sequence>
                        <xs:any minOccurs="0" maxOccurs="unbounded" processContents="lax" namespace="##any" />
                    </xs:sequence>
                </xs:complexType>
            </xs:element>
        </xs:sequence>
    </xs:complexType>
</xs:element>

目前这似乎不太理想,但让它增长一点.拥有一组有序的固定元素并不是什么大不了的事.您并不是唯一一个抱怨 W3C XML Schema 明显不足的人,但正如我之前所说,如果您必须使用它,您将不得不忍受它的局限性,或者接受开发的负担以更高的拥有成本绕过这些限制.

This may seem less than ideal at the moment, but let it grow a bit. Having an ordered set of fixed elements isn't that big a deal. You're not the only one who'll be complaining about this apparent deficiency of W3C XML Schema, but as I said earlier, if you have to use it, you'll have to live with its limitations, or accept the burden of developing around these limitations at a higher cost of ownership.

我相信您已经知道这一点,但默认情况下属性的顺序是不确定的.如果您的所有内容都是简单类型,您也可以选择更丰富地使用属性.

I'm sure you know this already, but the order of attributes is by default undetermined. If all your content is of simple types, you can alternatively choose to make a more abundant use of attributes.

无论您采用何种方法,您都会失去很多数据的可验证性.允许内容提供者添加内容类型通常更好,但前提是可以验证.这可以通过从 lax 切换到 strict 处理并使类型本身更严格来实现.但过于严格也不好,正确的平衡取决于您判断遇到的用例的能力,以及权衡某些实施策略的权衡的能力.

Whatever approach you take, you will lose a lot of verifiability of your data. It's often better to allow content providers to add content types, but only when it can be verified. This you can do by switching from lax to strict processing and by making the types themselves stricter. But being too strict isn't good either, the right balance will depend on your ability to judge the use-cases that you're up against and weighing that in against the trade-offs of certain implementation strategies.

这篇关于使用 XSD 验证 XML ...但仍允许可扩展性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆