W3C通过从网站上删除模块来中断XHTML 1.1解析 [英] W3C breaks XHTML 1.1 parsing by removing modules from web site

查看:83
本文介绍了W3C通过从网站上删除模块来中断XHTML 1.1解析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

W3C 推荐的文档类型声明列表指示XHTML 1.1的以下文档类型:

The W3C recommended list of doctype declarations indicates the following doctype for XHTML 1.1:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" 
   "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

这是列表分开,即 Wiley Dummies网站等。这是模块化XHTML 1.1 DTD的标准系统ID之一。

This is the same system ID recommended by A List Apart, the Wiley Dummies site, among many others. It was one of the standard system ID for the modular XHTML 1.1 DTD.

不幸的是,此模块化DTD引用了其他XML实体,其中W3C已从其站点中删除了其中的一些XML实体。 ,完全破坏了解析。

Unfortunately this modular DTD refers to other XML entities, some of which the W3C has removed from its site, completely breaking parsing.

您可以在Java 11中对此进行测试。从以下XHTML 1.1文件开始:

You can test this in Java 11. Start with the following XHTML 1.1 file:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
  <title>XHTML 1.1 Skeleton</title>
</head>
<body>
</body>
</html>

尝试使用标准的内置Java解析器对其进行解析:

Try to parse it using a standard, built-in Java parser:

DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
documentBuilderFactory.setNamespaceAware(true);
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
final Document document;
try (InputStream inputStream = new BufferedInputStream(getClass().getResourceAsStream("xhtml-1.1-test.xhtml"))) {
  document = documentBuilder.parse(inputStream);
}

解析将失败,并抛出 java.io。 http://www.w3.org/TR/xhtml11/DTD/xhtml-datatypes-1.mod 的FileNotFoundException 。显然,W3C将该实体从其网站上删除了。

Parsing will fail, throwing a java.io.FileNotFoundException for http://www.w3.org/TR/xhtml11/DTD/xhtml-datatypes-1.mod. Apparently the W3C has removed this entity from its web site altogether.

如果相反,则 http://www.w3.org/MarkUp/DTD/使用了xhtml11.dtd (在 XHTML中显示注释1.1规范DTD ),解析正常完成(尽管大约需要10分钟)。

If instead http://www.w3.org/MarkUp/DTD/xhtml11.dtd is used (which appears a a comment in the XHTML 1.1 specification DTD), parsing completes normally (albeit after about 10 minutes).

为什么W3C无法在 http://www.w3.org/TR/xhtml11/DTD/ 集合,使用标准的系统ID破坏XHTML 1.1解析?为什么在 http://www.w3.org/MarkUp/DTD/ 中没有可用的所有模块?我应该与W3C的谁联系以解决此问题? (为什么对这些实体进行HTTP访问需要这么长时间?)

Why does the W3C make insufficient entities available at the http://www.w3.org/TR/xhtml11/DTD/ collection, breaking XHTML 1.1 parsing with a standard system ID? Why aren't all the modules available that are available at http://www.w3.org/MarkUp/DTD/? Who at the W3C should I contact to get this fixed? (And why does HTTP access take so long for these entities?)

推荐答案

您提到的替代网址- http://www.w3.org/MarkUp/DTD/xhtml11.dtd -似乎在XHTML 1.1规范/ DTD /模块中一直使用,并且似乎是W3C认可的一种,而不是 http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd 。我的猜测是故意限制访问这些声明集,因为W3C不想将这些声明集提供给公众。您应该将它们存储在本地,并使用SGML / XML目录文件将标识符映射到您的本地实体/声明集。

The URL you mentioned as alternative - http://www.w3.org/MarkUp/DTD/xhtml11.dtd - seems to be consistently used in the XHTML 1.1 specs/DTDs/modules and appears to be the one endorsed by W3C, rather than http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd. My guess is access to these declaration sets is deliberately throttled, as W3C doesn't want to serve these to the general public; you're supposed to store these locally and use an SGML/XML catalog file mapping identifiers to your local entity/declaration sets.

我已经成功验证了XHTML 1.1文件使用libxml2的 xmllint 命令行工具,通过调用

I had success in validating an XHTML 1.1 file using libxml2's xmllint command-line tool by invoking

 SGML_CATALOG_FILES=./catalog xmllint --catalogs --dtdvalid xhtml11.dtd testdoc.xhtml

catalog 文件,其内容如下(以及引用的 .dtd .mod 和当然,该目录中有 .ent 个文件):

with a catalog file having the following content (and the referenced .dtd, .mod and .ent files in place in that directory, of course):

OVERRIDE YES

SGMLDECL "xml1.dcl"
PUBLIC "-//W3C//DTD XHTML 1.1//EN" "xhtml11.dtd"
PUBLIC "-//W3C//ENTITIES XHTML 1.1 Document Model 1.0//EN" "xhtml11-model-1.mod"
PUBLIC "-//W3C//ENTITIES XHTML Common Attributes 1.0//EN" "xhtml-attribs-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-attribs-1.mod" "xhtml-attribs-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Base Element 1.0//EN" "xhtml-base-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-base-1.mod" "xhtml-base-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML BDO Element 1.0//EN" "xhtml-bdo-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-bdo-1.mod" "xhtml-bdo-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Block Phrasal 1.0//EN" "xhtml-blkphras-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-blkphras-1.mod" "xhtml-blkphras-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Block Presentation 1.0//EN" "xhtml-blkpres-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-blkpres-1.mod" "xhtml-blkpres-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Block Structural 1.0//EN" "xhtml-blkstruct-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-blkstruct-1.mod" "xhtml-blkstruct-1.mod"
PUBLIC "-//W3C//ENTITIES XHTML Character Entities 1.0//EN" "xhtml-charent-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-charent-1.mod" "xhtml-charent-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Client-side Image Maps 1.0//EN" "xhtml-csismap-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-csismap-1.mod" "xhtml-csismap-1.mod"
PUBLIC "-//W3C//ENTITIES XHTML Datatypes 1.0//EN" "xhtml-datatypes-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-datatypes-1.mod" "xhtml-datatypes-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Editing Markup 1.0//EN" "xhtml-edit-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-edit-1.mod" "xhtml-edit-1.mod"
PUBLIC "-//W3C//ENTITIES XHTML Intrinsic Events 1.0//EN" "xhtml-events-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-events-1.mod" "xhtml-events-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Forms 1.0//EN" "xhtml-form-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-form-1.mod" "xhtml-form-1.mod"
PUBLIC "-//W3C//ENTITIES XHTML Modular Framework 1.0//EN" "xhtml-framework-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-framework-1.mod" "xhtml-framework-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Hypertext 1.0//EN" "xhtml-hypertext-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-hypertext-1.mod" "xhtml-hypertext-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Images 1.0//EN" "xhtml-image-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-image-1.mod" "xhtml-image-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Inline Phrasal 1.0//EN" "xhtml-inlphras-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-inlphras-1.mod" "xhtml-inlphras-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Inline Presentation 1.0//EN" "xhtml-inlpres-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-inlpres-1.mod" "xhtml-inlpres-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Inline Structural 1.0//EN" "xhtml-inlstruct-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-inlstruct-1.mod" "xhtml-inlstruct-1.mod"
PUBLIC "-//W3C//ENTITIES XHTML Inline Style 1.0//EN" "xhtml-inlstyle-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-inlstyle-1.mod" "xhtml-inlstyle-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Legacy Markup 1.0//EN" "xhtml-legacy-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-legacy-1.mod" "xhtml-legacy-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Link Element 1.0//EN" "xhtml-link-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-link-1.mod" "xhtml-link-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Lists 1.0//EN" "xhtml-list-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-list-1.mod" "xhtml-list-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Metainformation 1.0//EN" "xhtml-meta-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-meta-1.mod" "xhtml-meta-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Embedded Object 1.0//EN" "xhtml-object-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-object-1.mod" "xhtml-object-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Param Element 1.0//EN" "xhtml-param-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-param-1.mod" "xhtml-param-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Presentation 1.0//EN" "xhtml-pres-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-pres-1.mod" "xhtml-pres-1.mod"
PUBLIC "-//W3C//ENTITIES XHTML Qualified Names 1.0//EN" "xhtml-qname-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-qname-1.mod" "xhtml-qname-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Ruby 1.0//EN" "xhtml-ruby-1.mod"
SYSTEM "http://www.w3.org/TR/ruby/xhtml-ruby-1.mod" "xhtml-ruby-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Scripting 1.0//EN" "xhtml-script-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-script-1.mod" "xhtml-script-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Server-side Image Maps 1.0//EN" "xhtml-ssismap-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-ssismap-1.mod" "xhtml-ssismap-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Document Structure 1.0//EN" "xhtml-struct-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-struct-1.mod" "xhtml-struct-1.mod"
PUBLIC "-//W3C//DTD XHTML Style Sheets 1.0//EN" "xhtml-style-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-style-1.mod" "xhtml-style-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Tables 1.0//EN" "xhtml-table-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-table-1.mod" "xhtml-table-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Text 1.0//EN" "xhtml-text-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-text-1.mod" "xhtml-text-1.mod"
PUBLIC "-//W3C//ENTITIES XHTML 1.1 Document Model 1.0//EN" "xhtml11-model-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml11-model-1.mod" "xhtml11-model-1.mod"
PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN" "xhtml-lat1.ent"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-lat1.ent" "xhtml-lat1.ent"
PUBLIC "-//W3C//ENTITIES Special for XHTML//EN" "xhtml-special.ent"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-special.ent" "xhtml-special.ent"
PUBLIC "-//W3C//ENTITIES Symbols for XHTML//EN" "xhtml-symbol.ent"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-symbol.ent" "xhtml-symbol.ent"

请注意,这是SGML /传统/普通目录语法。如果要在Java / JAXP中使用它,则必须将其转换为XML语法的目录文件。

Note this is SGML/traditional/plain catalog syntax. If you want to use it with Java/JAXP, you'll have to convert it into a catalog file in XML syntax.

这篇关于W3C通过从网站上删除模块来中断XHTML 1.1解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆