在DTD和XSD架构中不允许非确定性元素声明的原因是什么? [英] What is the reason for not allowing non-deterministic element declarations in DTDs and XSD schemas?

查看:94
本文介绍了在DTD和XSD架构中不允许非确定性元素声明的原因是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下声明:



<!ELEMENT p((b,a)|(b,c))>



及其等效的XSD均无效,因为它们不确定性。但是,由于每个非确定性有限自动机都有一个等效的确定性有限自动机,并且由于存在将NFA转换为DFA的算法,因此禁止非确定性声明的原因是什么?

解决方案

对于这样的问题,有两种可能的答案:技术和历史。



没有合理的技术理由XML DTD或XSD的确定性规则。



由于确定性规则,两个XSD内容模型的交集和并集不能保证用合法的XSD内容模型来描述。由于确定性规则,某些常规语言无法表示为内容模型。正是由于确定性规则,XSD中的抽象类型和替换组远远不足以使词汇易于扩展。简而言之,确定性规则对SGML,XML DTD或XSD毫无贡献,但没有任何意义的复杂性(对于SGML和XSD,crackpot术语是: ambiguity 实际上不是歧义,而独特的粒子归因而不是确定性强-为什么可以使用五个音节,而又可以使用九个音节?)。



这就是历史答案。



在我为这些遗憾的历史中的这一部分作序之前,我向那些关心这些事情的人表示歉意。在1996年设计XML DTD时,我试图说服工作组摆脱确定性规则,但失败了。我试图说服XML Schema WG在1998-2001年设计XSD 1.0时不采用它们,但我失败了。我试图说服该工作组在2001年至2012年淘汰XSD 1.1,当时XSD 1.1逐渐(非常非常缓慢)应运而生,但我又一次失败了。抱歉;我确实做了尝试。



SGML工作组中的那些人最初创建了规则(在ISO 8879中),而XML和XML Schema工作组中的人投票保留了该规则。多年以来,当我问他们时,规则提供了各种合理化方法。



XML WG中的一些人认为,确定性规则可以简化解析器,作家的任务。愤世嫉俗的人反复(私下)提出该规则最初是因为SGML WG的有影响力的成员无法弄清楚如何正确执行回溯。该工作组中的其他人都强烈否认了这一主张。 (有趣的是,我的回忆是,在XML WG中,那些主张将确定性简化分析程序的工作的人包括James Clark,他批评克里斯在回答这个问题时引用的消息。我希望他在为时已晚之前改变了主意。)工作组其他成员认为确定性规则是伪造的,但如果没有与SGML的不可接受的兼容性,我们就无法更改它。



在XML Schema WG中,一些WG成员告诉我,他们认为确定性是一个好主意,因为这意味着XSD验证程序可以使用现有的内容模型验证代码库DTD验证器。 (真相!这表明人们对代码重用有相当动人的信念,或者可能过度消耗了致幻剂。)当时,许多工作组成员给我的印象是,他们认为自己对问题的理解不够清晰,没有独立的观点和观点。认为与DTD的向后兼容性比进行更改可能会带来他们无法预料的后果更安全。如果事实证明不必要或无济于事,稍后更改(他们认为)很容易。 (后来,在XSD 1.1上工作时,供应商拒绝消除确定性规则的任何尝试,就好像他们拒绝对自己的美德进行攻击一样。对于我们以后可以随时放松约束而言,这是如此重要。)



某些人(无论是在SGML中还是在XSD WG中)都建议确定性规则很有用,因为它允许注释与内容模型中的特定位置相关联。在XSD案例中,这就像打败一场失败的战斗一样使我印象深刻-根据实例中的位置进行注释也很容易,并且现有的XPath基础结构和思想共享使这条道路更为可取。在SGML情况下,由于尚未发明XPath,因此该参数不适用,但是在SGML中,内容模型的各个粒子上都不允许使用注解,因此,无论如何,这种想法都不成立。



这个想法可以保留,因为XSD架构作者可以向单个粒子添加 xs:annotation 元素在内容模型中。我问了14年左右,却没有找到在生产模式中使用过此功能的任何人,在实验测试模式中使用过此功能的人,或听说过在任何类型的模式中使用过此功能的人的人完全没有我也没有找到能够对具体应用程序提供连贯说明的人,这将对您有所帮助。 (作为回应,他们向我指出,我从未提供过明确的证明,在五种软件工程专业教授的认可下,它在任何情况下都永远无济于事。他们是对的;我猜我'只是懒惰。但是我从来不明白为什么在XSD中包含愚蠢的门槛如此之低,而消除它们的门槛如此之高。)



我听过的合理论点来自几个(非常少)有思想的工作组成员(非常少-好,一个),他们清楚地表示他们非常了解技术问题,但是在交易服务器部署方案中请注意,即使在无法进行模式预编译和缓存的情况下,验证速度也至关重要。因此,他们希望保留确定性约束,以避免(a)使用NFA而非DFA进行验证的成本,以及(b)确定NFA的二次成本。我实际上并不认为这是一个令人信服的论点(出于天堂的缘故,为什么事务服务器不应该进行模式缓存?),但是无可否认的人比我更了解事务服务器。

$总而言之:SGML发明了确定性规则,其原因是迷失在时间的迷雾中;工作组中没有两个成员讲相同的故事,我得出的结论是,对于拥有规则的原因并没有达成共识,只有很多个人原因。 XML DTD保留了与SGML兼容的规则(仅合法的SGML DTD成为合法的XML是不够的,我们希望所有合法的XML DTD都成为合法的SGML-工作组由 SGML成员组成) 。然后,XSD从XML DTD中获得了确定性规则,然后保留它的理由并不特别,除了恐惧,不确定性和疑问。



叹息。


The following declaration:

<!ELEMENT p ((b, a) | (b, c))>

and its XSD equivalent are both invalid because they are not deterministic, according to validators and a quick check of the spec(s). However, since every non-deterministic finite automaton has an equivalent deterministic finite automaton and since there are algorithms for converting NFAs into DFAs, what is the reason for prohibiting non-deterministic declarations?

解决方案

There are two classes of possible answer to a question like this: technical and historical.

There is no sound technical reason for the determinism rules of XML DTDs or XSD.

It is because of the determinism rule that the intersection and union of two XSD content models are not guaranteed to be describable with a legal XSD content model. It is because of the determinism rule that some regular languages cannot be expressed as content models. It is because of the determinism rule that abstract types and substitution groups in XSD fall so far short of their potential in making vocabularies easily extensible. In short, the determinism rule contributes nothing to SGML, XML DTDs, or XSD but pointless complication (and, in the cases of SGML and XSD, crackpot termininology: ambiguity that is actually not ambiguity, and unique particle attribution instead of strong determinism -- why use five syllables when you can use nine?).

That leaves the historical answer.

I preface these remarks by saying that for my part in this sorry history I apologize to those who care about these things. I tried to persuade the WG to get rid of the determinism rules in 1996 when we were designing XML DTDs, and I failed. I tried to persuade the XML Schema WG not to adopt them in 1998-2001 when we were designing XSD 1.0, and I failed. I tried to persuade that WG to get rid of them in 2001-2012 when XSD 1.1 was gradually (very, very gradually) coming into being, and I failed again. Sorry; I did try.

Those in the SGML working group which originally created the rule (in ISO 8879) and those in the XML and XML Schema working groups who voted to retain the rule have offered a variety of rationalizations, when over the years I have asked them.

Some in the XML WG argued that the determinism rules offer a useful simplification of the parser-writer's task. Cynics have repeatedly suggested (in private) that the rule arose originally because influential members of the SGML WG couldn't figure out how to make backtracking work correctly; others in that WG have hotly denied the claim. (Interestingly, my recollection is that in the XML WG those who argued for determinism as simplifying the parser's task included James Clark, who criticizes the determinism rule in the message cited by chris in his answer to this question. I wish he had changed his mind before it was too late.) Others in the WG thought the determinism rule was bogus, but that we couldn't change it without an unacceptable incompatibility with SGML.

In the XML Schema WG, some WG members tell me they thought determinism was a good idea because it would mean XSD validators could use the content-model validation code base from existing DTD validators. (Truth! This suggests a rather touching faith in code reuse, or possibly excessive consumption of hallucinogens.) At the time, many WG members gave me the impression that they didn't feel they understood the issue clearly enough to have an independent view and thought that backward compatibility with DTDs would be safer than making a change which might have consequences they could not foresee. It would be easy (they thought) to change later if it proved unnecessary or unhelpful. (Later, during the work on XSD 1.1, vendors resisted any attempt to eliminate the determinism rule as if they were repelling an attack on their virtue. So much for "We can always relax the constraint later".)

Some people (both in the SGML and in the XSD WGs) have suggested the determinism rule is useful because it allows annotations to be associated with particular positions in the content model. In the XSD case, this strikes me as fighting a lost battle -- it is just as easy to annotate based on position in the instance, and the existing XPath infrastructure and mindshare makes that a far preferable course. In the SGML case, that argument doesn't apply since XPath hadn't yet been invented, but in SGML annotations are not allowed on individual particles of a content model, so the idea was a non-starter in any case.

This idea survives, though, in the ability of XSD schema authors to add xs:annotation elements to individual particles in a content model. I have been asking for fourteen years or so now without finding anyone who has used this facility in a production schema, anyone who has used this facility in an experimental test schema, or anyone who has heard of anyone using this facility in any kind of schema at all. Nor have I found anyone able to provide a coherent account of a concrete application in which it would be helpful. (They, in response, have pointed out to me that I have never provided an ironclad proof that it could never ever under any circumstances be helpful, in triplicate with endorsements from five full professors of software engineering. They're right; I guess I'm just lazy. But I have never understood why the bar for including stupidities in XSD was so low, and the bar for eliminating them so high.)

The only halfway plausible argument I have ever heard came from a few (very few) thoughtful WG members (very few -- well, one) who made clear that they understood the technical issues perfectly well, but that in the transaction-server deployment scenarios they had in mind, validation speed was essential even in situations where schema pre-compilation and caching would be infeasible. So they wanted to retain the determinism constraint so as to avoid (a) the cost of validating with an NFA instead of a DFA, and (b) the quadratic cost of determinizing an NFA. I don't actually think this is a compelling argument (why should schema caching be impossible for a transaction server, for heaven's sake?), but the person who made it undeniably knows more about transaction servers than I do.

In sum: SGML invented the determinism rule for reasons lost in the mists of time; no two members of the WG tell the same story, and I conclude that there was no consensus there on the reason to have the rule, only a lot of individual reasons. XML DTDs retained the rule for compatibility with SGML (it was not enough that legal SGML DTDs be legal XML, we wanted all legal XML DTDs to be legal SGML -- the WG was made up of SGML people). And XSD got the determinism rule from XML DTDs and then retained it for no particular reasons but fear, uncertainty, and doubt.

Sigh.

这篇关于在DTD和XSD架构中不允许非确定性元素声明的原因是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆