如何验证一个 XSD 架构是另一个 XSD 架构的子集? [英] How to verify that one XSD schema is a subset of another XSD schema?

查看:43
本文介绍了如何验证一个 XSD 架构是另一个 XSD 架构的子集?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何验证一个 XSD 架构是另一个 XSD 架构的子集?

How can I verify that one XSD schema is a subset of another XSD schema?

我们正在使用一组蓝图"XSD 模式(定义子组件可用的所有可能的输入或输出)创建一个系统级应用程序.许多子组件正在实施,这些子组件使用 XML 文件在它们之间传递数据.每个子组件都会创建相关蓝图 XSD 模式的子集(以指示它选择实现哪些可能的输入或输出).任何针对子集 XSD 架构进行验证的 XML 数据文件也必须针对蓝图 XSD 架构进行验证,但反之则不然(因为子集 XSD 架构可能不包含蓝图 XSD 架构中的所有可选"或选择"XML 元素,它可以选择进一步限制现有 XML 标记上允许的数据值).系统将根据子组件的子集 XSD 架构验证子组件的所有 XML 输入(以标记任何错误输入并隔离与数据相关的问题的来源).

We are creating a system-of-systems application using a collection of "blueprint" XSD schemas (which defines all possible inputs or outputs available to a subcomponent). Many subcomponents are being implemented, and these subcomponents pass data among themselves using XML files. Each subcomponent creates a subset of the relevant blueprint XSD schema (to indicate which of the possible inputs or output it has chosen to implement). Any XML datafile that validates against a subset XSD schema must also validate against the blueprint XSD schema, but the reverse is not true (as the subset XSD schema may not contain all "optional" or "choice" XML elements from the blueprint XSD schema, and it may choose to further restrict allowed data values on an existing XML tag). The system will validate all XML inputs to a subcomponent against that subcomponent's subset XSD schema (to flag any bad inputs and isolate the source of data-related problems).

在测试期间,我们打算验证每个子组件的子集 XSD 架构确实是关联蓝图 XSD 架构的子集,但我们没有执行此验证的自动化方法.这些 XSD 模式相当大且丑陋,需要手动进行此测试.最好有一种验证 XSD 文件 1 与 XSD 文件 2"命令,类似于 Java 如何根据 XSD 模式执行 XML 文件的验证.我们想确认每个子组件的子集 XSD 模式将不允许任何会违反蓝图 XSD 模式的 XML 输入/输出组合.使用这种模式到模式的功能,验证子组件 A 的输出 XML 是否适合用作子组件 B 的输入也将非常有帮助(我们可以轻松地针对 XSD 模式验证单个输出 XML,但是我们想确认子组件 A 的所有可能的 XML 输出都将根据子组件 B 的 XSD 架构进行验证).

During testing, we intend to verify that each subcomponent's subset XSD schema is truly a subset of the associated blueprint XSD schema, but we have no automated means of performing this verification. These XSD schemas are rather large and ugly to need to do this testing by hand. It would be nice to have a kind of "validate XSD file 1against XSD file 2" command, similar to how Java can perform a validation of an XML file against an XSD schema. We want to confirm that each subcomponent's subset XSD schema will not allow any combinations of XML input/output that would violate the blueprint XSD schema. With this schema-to-schema capability, it would also be very helpful to verify if the output XML from subcomponent A would be appropriate to be used as input to subcomponent B (we can easily validate a single output XML against a XSD schema, but we want to confirm that all possible XML outputs from subcomponent A will validate against subcomponent B's XSD schema).

有用信息:此应用程序是作为 OSGi 包实现并使用 Maven 2.2.1 编译/执行的 Java 6 应用程序的集合.没有使用任何特定开发 IDE 的要求.该系统正在 Microsoft Windows XP 环境中进行测试,但也计划在其他环境中执行该系统(因此首选跨平台解决方案).

Helpful information: This application is a collection of Java 6 applications implemented as OSGi bundles and compiled/executed using Maven 2.2.1. There are no requirements for using any specific development IDE. The system is being tested upon a Microsoft Windows XP environment, but there are plans to execute this system upon other environments as well (so a cross-platform solution would be preferred).

推荐答案

确保所需关系的最简单方法是通过限制蓝图模式的类型来派生子集模式的类型.不过,听起来好像那艘船已经航行了.

The simplest way to ensure the relationship you want is to derive the types of the subset schemas by restriction from the types of the blueprint schema. It sounds as if that boat has already sailed, though.

像这里的其他人一样,我不知道有任何开箱即用的工具(尽管如果 Petru Gardea 说 QT 助手可以,那么值得跟进).

Like others here, I am not aware of any tools that do this out of the box (although if Petru Gardea says QT Assistant can, it's worth following up).

一个复杂的问题是有两种不同的方式来查看您要验证的子集/超集关系:(1) 模式 1 接受为有效的每个文档(或元素)也被模式 2 接受为有效(无需参考到所做的类型分配),或(2)根据模式 1 和模式 2 的验证(在规范中称为后模式验证信息集)生成的类型化文档彼此之间存在适当的关系:如果元素或属性在树 1 中有效,在树 2 中有效;树 1 中分配给它的类型是树 2 中分配给它的类型的限制;等等.如果模式 1 和 2 是独立开发的,那么它们的类型通过派生相关的可能性很小,所以我猜你有第一种方法来解决这个问题.

One complication is that there are two different ways to view the subset/superset relation you want to verify: (1) every document (or element) accepted as valid by schema 1 is also accepted as valid by schema 2 (without reference to the type assignments made), or (2) the typed documents produced by validation (in what the spec calls the post-schema-validation infoset) against schemas 1 and 2 stand in an appropriate relation to each other: if an element or attribute is valid in tree 1, it's valid in tree 2; the type assigned to it in tree 1 is a restriction of the type assigned to it in tree 2; etc. If schemas 1 and 2 were developed independently, the chances that their types are related by derivation are poor, so I guess you have the first approach to the question in mind.

但是,无论是哪种形式,问题都绝对是可判定的.对于任何模式(我谨慎地使用该术语),根据定义,都声明了有限数量的类型和有限数量的元素名称;因此存在有限数量(可能很大)的元素名称/类型对.

The problem, though, is definitely decidable, in either form. For any schema (I'm using the term carefully) there are by definition a finite number of types and a finite number of element names declared; it follows that there is a finite number (possibly large) of element name / type pairs.

算法可以是这样的.

  1. 从预期的根元素开始.(如果有多个可能的根元素,那么在一般情况下,您需要为每个根元素运行此检查.)如果预期的根元素是 E,在模式 1 中类型为 T1,在模式 2 中类型为 T2,则将任务比较类型 T1 和 T2"放入打开任务队列中.已完成的任务列表将为空.

  1. Start with the expected root element. (If there are multiple possible root elements, then in the general case you'll need to run this check for each of them.) If the expected root element is E, with type T1 in schema 1 and type T2 in schema 2, then place the task "Compare type T1 and T2" in a queue of open tasks. The list of tasks already completed will be empty.

要比较两个复杂类型 T1 和 T2:

To compare two complex types T1 and T2:

  • 检查为 T1 和 T2 声明的属性集,以了解它们名称之间的子集/超集关系.确保预期超集中所需的属性在预期子集中不缺失或可选.

  • Check the sets of attributes declared for T1 and T2 for a subset/superset relation between their names. Make sure no attribute required in the intended superset is absent or optional in the intended subset.

为 T1 和 T2 声明的每个属性 A 都将被分配一个类型(称为 ST1 和 ST2).如果 ST1 = ST2,则什么都不做;否则,将任务比较简单类型 ST1 和 ST2"添加到打开任务的队列中,除非它在已经完成的比较列表中.

Each attribute A declared for both T1 and T2 will be assigned a type (call them ST1 and ST2). If ST1 = ST2, do nothing; otherwise, add the task "Compare simple types ST1 and ST2" to the queue of open tasks, unless it's on the list of comparisons already completed.

现在检查 T1 和 T2 中可能的子项序列——正如 13ren 在评论中所暗示的那样,这很容易处理,因为内容模型本质上是使用元素名称集作为其字母表的正则表达式;因此,它们定义的语言是正则的,并且子集/超集关系对于正则语言是可判定的.

Now check the sequences of children that are possible in T1 and T2 -- as 13ren suggests in a comment, this is tractable since content models are essentially regular expressions which use the set of element names as their alphabet; the languages they define are therefore regular, and the subset/superset relation is decidable for regular languages.

父类型 T1 和 T2 为每个可能的子元素 C 分配了一个元素声明和一个类型定义.让我们称它们为 ED1、ED2、CT1 和 CT2.每个同名的子元素将具有相同的类型,但不同的子元素可能匹配不同的元素声明.因此,对于任何可能的名称,将只有一对类型 CT1 和 CT2,但可能有多对 ED1 和 ED2(并且需要仔细分析以确保它们正确匹配;这可能很难自动化).

Each possible child element C is assigned both an element declaration and a type definition by the parent types T1 and T2. Let us call them ED1, ED2, CT1, and CT2. Every child of the same name will have the same type, but different children may match different element declarations. So for any possible name, there will be just one pair of types CT1 and CT2, but there may be multiple pairs ED1 and ED2 (and the analysis will need to be careful to make sure they are matched up correctly; that might be hard to automate).

如果 CT1 = CT2,则什么都不做,否则将比较类型 CT1 和 CT2"放入打开的任务队列,除非比较已经执行.

If CT1 = CT2, do nothing, otherwise put "Compare types CT1 and CT2" onto the open task queue, unless the comparison has already been performed.

如果 ED1 和 ED2 在结构上相同,则什么都不做;否则将比较它们的任务放入任务队列(除非已经完成).

If ED1 and ED2 are structurally identical, do nothing; otherwise put the task of comparing them into the task queue (unless it's already been done).

要比较两个简单类型 ST1 和 ST2,请比较它们的词法空间(如果您想要模式上子集/超集关系的第一个定义)或它们的值空间(如果您想要第二个).如果 ST1 和 ST2 都是相同基元类型的限制,则您可以轻松地比较针对它们的一组有效的基于方面的限制.模式方面可能会使问题复杂化,但由于它定义了一组正则表达式,因此可以为其确定子集/超集关系.

To compare two simple types ST1 and ST2, compare either their lexical spaces (if you want the first definition of the subset/superset relation on schemas) or their value spaces (if you want the second). If ST1 and ST2 are both restrictions of the same primitive type, you may be able to compare the set of effective facet-based restrictions on them easily. The pattern facet may complicate matters, but because it defines a set of regular expressions, the subset/superset relation is decidable for it.

要比较两个元素声明,您需要比较元素声明的每个属性并检查所需的子集/超集关系.

To compare two element declarations, you need to compare each of the properties for the element declaration and check for the desired subset/superset relation.

如您所见,它非常复杂和乏味,您确实希望自动执行此分析,而且它也非常复杂,因此很容易理解为什么它没有作为开箱即用功能广泛提供.但是编码肯定会很有趣.

As you can see, it's complex and tedious enough that you really want to automate this analysis, and it's also complex enough that it's easy to see why it's not widely offered as out-of-the-box function. But it would certainly be interesting to code.

这篇关于如何验证一个 XSD 架构是另一个 XSD 架构的子集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆