DTD 与 XSD 定义的 XML 语言的范围 [英] Scope of XML languages defined by DTD vs XSD

查看:14
本文介绍了DTD 与 XSD 定义的 XML 语言的范围的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下列命题是否成立:对于每个 DTD,都有一个定义完全相同语言的 XSD,对于每个 XSD,都有一个定义完全相同语言的 DTD.或者换一种说法:任何 DTD 定义的语言集合正是任何 XSD 定义的语言集合?

Does the following propositions hold: For every DTD there is an XSD that defines exactly the same language, and for every XSD there is a DTD that defines exactly the same language. Or put another way: The collection of languages defined by any DTD is exactly the the collection of languages defined by any XSD?

稍微扩展一下这个问题:XML 文档基本上是一个大字符串.语言是字符串的集合.例如,所有 MathML 文档的(无限)集合是一种语言,所有 RSS 文档的集合也是如此,等等.MathML (RSS, ...) 也是所有 XML 文档的(无限)集合的真子集.您可以使用 DTD 或 XSD 来定义这样的 XML 子集.

Expanding on the question a little: An XML document is basically a large string. A language is a collection of strings. For example, the (infinite) set of all MathML documents is a language, and so is the set of all RSS documents and so on. MathML (RSS, ...) is also a proper subset of the (infinite) set of all XML documents. You can use DTD or XSD to define such a subset of XML.

现在,每个 DTD 都只定义一种语言.但是,如果您考虑所有可能的 DTD,就会得到一组语言.我的问题是,这个集合是否与您从所有可能的 XSD 中获得的集合完全相同?如果是这样,那么 DTD 和 XSD 是等价的,因为两者定义的 XML 语言的范围是相等的.

Now, every DTD defines exactly one language. But if you think of all possible DTDs, you get a set of languages. My question is, is this set exactly the same as the one you get from all possible XSDs? If so, then DTD and XSD are equivalent in the sense that the scope of XML languages defined by either is equal.

为什么这个问题很重要?如果 DTD 和 XSD 都是等价的,那么就可以编写一个以 DTD 作为输入并为您提供等价 XSD 的程序,而另一个程序则相反.我知道有很多程序声称可以做到这一点,但我怀疑这是否真的可行.

Why is this question important? If both DTD and XSD are equivalent then it is possible to write a program that takes a DTD as input and gives you an equivalent XSD, and another program that does the opposite. I know there are quite a few programs out there that claim to do exactly this, but I'm in doubt whether or not that's actually possible.

推荐答案

一个有趣的问题;问得好!

An interesting question; well asked!

答案都是不",双向.

这是一个在 XSD 中没有等效的 DTD:

Here is a DTD which has no equivalent in XSD:

<!ELEMENT e (#PCDATA | e)* >
<!ENTITY egbdf "Every good boy deserves favor.">

这个DTD接受的字符序列集包括<e/><e>&egbdf;</e>,但是不是 <e>&beadgcf;</e>.

The set of character sequences accepted by this DTD includes both <e/> and <e>&egbdf;</e>, but not <e>&beadgcf;</e>.

由于 XSD 验证在一个信息集上运行,其中所有实体都已展开,因此没有 XSD 架构可以区分第三种情况和第二种情况.

Since XSD validation operates on an information set in which entities have all already been expanded, no XSD schema can distinguish the third case from the second.

DTD 可以表达 XSD 中无法表达的约束的第二个领域涉及 NOTATION 类型.我不会举例;细节太复杂了,我不查就无法正确记住它们,而且不够有趣,让我想这样做.

A second area where DTDs can express constraints not expressible in XSD involves NOTATION types. I won't give an example; the details are too complicated for me to remember them correctly without looking them up, and not interesting enough to make me want to do so.

第三个方面:DTD 以相同的方式处理命名空间属性(也称为命名空间声明)和一般属性;因此,DTD 可以限制名称空间声明在文档中的出现.XSD 架构不能.这同样适用于 xsi 命名空间中的属性.

A third area: DTDs treat namespace attributes (aka namespace declarations) and general attributes in the same way; a DTD can therefore constrain the appearance of namespace declarations in documents. An XSD schema cannot. The same applies to attributes in the xsi namespace.

如果我们忽略所有这些问题,并仅针对不包含对命名实体的引用的字符序列制定问题,而不是预定义的实体 lt, gt 等等,那么答案就会改变:对于每个不涉及 NOTATION 声明的 DTD,都有一个 XSD 模式,它在实体扩展后接受完全相同的文档集,并且以忽略命名空间属性和属性的方式定义相同"xsi 命名空间.

If we ignore all of those issues, and formulate the question with respect only to character sequences containing no references to named entities other than the pre-defined entities lt, gt, etc., then the answer changes: for every DTD not involving NOTATION declarations, there is an XSD schema that accepts precisely the same set of documents after entity expansion and with 'same' defined in a way that ignores namespace attributes and attributes in the xsi namespace.

在另一个方向上,不同的领域包括:

In the other direction, the areas of difference include these:

  • XSD 是命名空间感知的:以下 XSD 架构接受指定目标命名空间中元素 e 的任何实例,而不管文档实例中绑定到该命名空间的前缀是什么.

  • XSD is namespace aware: the following XSD schema accepts any instance of element e in the specified target namespace, regardless of what prefix is bound to that namespace in the document instance.

<xs:schema xmlns:xs="..." targetNamespace="http://example.com/nss/24397">
  <xs:element name="e" type="xs:string"/>
</xs:schema>

没有 DTD 可以成功接受给定命名空间中的所有且只有 e 元素.

No DTD can successfully accept all and only the e elements in the given namespace.

XSD 具有更丰富的数据类型集,可以使用数据类型来约束元素和属性.以下 XSD 架构没有等效的 DTD:

XSD has a richer set of datatypes and can use datatypes to constrain elements as well as attributes. The following XSD schema has no equivalent DTD:

<xs:schema xmlns:xs="...">
  <xs:element name="e" type="xs:integer"/>
</xs:schema>

此模式接受文档 <e>42</e> 但不接受文档 <e>42d Street</e>.没有 DTD 可以做出这种区分,因为 DTD 没有限制 #PCDATA 内容的机制.最接近的 DTD 是 <!ELEMENT e (#PCDATA)>,它接受两个示例文档.

This schema accepts the document <e>42</e> but not the document <e>42d Street</e>. No DTD can make that distinction, because DTDs have no mechanism for constraining #PCDATA content. The closest DTD would be <!ELEMENT e (#PCDATA)>, which accepts both sample documents.

XSD 的 xsi:type 属性允许在文档内修改内容模型.以下架构文档描述的 XSD 架构没有等效的 DTD:

XSD's xsi:type attribute allows in-document modifications of content models. The XSD schema described by the following schema document has no equivalent DTD:

<xs:schema xmlns:xs="...">
  <xs:complexType name="e">
    <xs:sequence>
      <xs:element ref="e" minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
  </xs:complexType>
  <xs:complexType name="e2">
    <xs:sequence>
      <xs:element ref="e" minOccurs="2" maxOccurs="2"/>
    </xs:sequence>
  </xs:complexType>

  <xs:element name="e" type="e"/>
</xs:schema>

此架构接受文档 <e xmlns:xsi="..." xsi:type="e2"><e/><e/></e> 并拒绝文档 <e xmlns:xsi="..." xsi:type="e2"><e/><e/><e/></e>.DTD 没有使内容模型依赖于文档实例中给定的属性值的机制.

This schema accepts the document <e xmlns:xsi="..." xsi:type="e2"><e/><e/></e> and rejects the document <e xmlns:xsi="..." xsi:type="e2"><e/><e/><e/></e>. DTDs have no mechanism for making content models depend on an attribute value given in the document instance.

XSD 通配符允许在指定元素的子元素中包含任意格式良好的 XML;与 DTD 最接近的方法是使用 <!ELEMENT e ANY> 形式的元素声明,这是不一样的,因为它需要声明所有元素出现.

XSD wildcards allow the inclusion of arbitrary well-formed XML among the children of specified elements; the closest one can come to that with a DTD is to use an element declaration of the form <!ELEMENT e ANY>, which is not the same because it requires declarations for all the elements which in fact appear.

XSD 1.1 提供了断言和条件类型分配,这在 DTD 中没有类似物.

XSD 1.1 provides assertions and conditional type assignment, which have no analogues in DTDs.

XSD 的表达能力可能在其他方面超过了 DTD,但我认为这一点已经得到充分说明.

There are probably other ways in which the expressive power of XSD exceeds that of DTDs, but I think the point has been illustrated adequately.

我认为一个公平的总结是:XSD 可以表达 DTD 可以表达的一切,除了实体声明和特殊情况,如命名空间声明和 xsi:* 属性,因为 XSD 被设计为能够做到这一点.因此,将 DTD 转换为 XSD 模式文档时的信息丢失相对较少,易于理解,并且主要涉及大多数词汇设计者认为没有实质性意义的 DTD 人工制品.

I think a fair summary would be: XSD can express everything DTDs can express, with the exception of entity declarations and special cases like namespace declarations and xsi:* attributes, because XSD was designed to be able to do so. So the loss of information when translating a DTD to an XSD schema document is relatively modest, well understood, and mostly involves things most vocabulary designers regard as DTD artefacts not of substantive interest.

XSD 可以表达比 DTD 更多的内容,这也是因为 XSD 就是为此而设计的.在一般情况下,从 XSD 到 DTD 的翻译必然涉及信息丢失(接受的文档集可能需要更大、或更小,或者是重叠集).关于如何管理信息丢失可以做出不同的选择,这就产生了一个问题:如何最好地将 XSD 转换为 DTD 形式?"一定的理论兴趣.(然而,在实践中似乎很少有人觉得这是一个有趣的问题.)

XSD can express more than DTDs can, again because XSD was designed to do so. In the general case, translation from XSD to DTD necessarily involves loss of information (the set of documents accepted may need to be larger, or smaller, or to be an overlapping set). Different choices can be made about how to manage the loss of information, which gives the question "How does one best translate an XSD into DTD form?" a certain theoretical interest. (Very few people, however, seem to find it an interesting question in practice.)

正如您的问题一样,所有这些都集中在作为字符序列的文档、作为文档集的语言以及作为这种意义上的语言生成器的模式语言上.模式中存在的可维护性和信息问题不会变成文档集扩展中的差异(例如,文档模型中的类层次结构的处理)被忽略了.

All of this focuses, as did your question, on documents as character sequences, on languages as document sets, and on schema languages as generators of languages in that sense. Issues of maintainability and information present in the schema that does not turn into differences in the extension of document sets (e.g. the treatment of class hierarchies in the document model) is left out of account.

这篇关于DTD 与 XSD 定义的 XML 语言的范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆