HTML 4,HTML 5,XHTML,MIME类型 - 权威资源 [英] HTML 4, HTML 5, XHTML, MIME types - the definitive resource

查看:104
本文介绍了HTML 4,HTML 5,XHTML,MIME类型 - 权威资源的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

HTML与XHTML和XHTML作为text / html与XHTML作为XHTML的主题相当复杂。不幸的是,很难获得完整的图片,因为信息大部分都是在网页上传播,或者深埋在W3C技术术语中。此外还有一些错误信息正在传播。我建议将其作为关于该主题的权威SO资源,描述以下最重要的方面: HTML 4
  • HTML 5

  • XHTML 1.0 as text / html,application / xml + xhtml

  • XHTML 1.1 as application / xml + xhtml



  • 每个实际含义是什么?

    常见的缺陷是什么?

    什么是是每种技术的正确MIME类型的重要性


    不同浏览器如何处理它们?

    我希望每种技术都能看到一个答案。我将此作为社区维基,而不是提供多余的答案,请编辑答案以完成图片。随意从存根开始。也可随时编辑此问题。


  • 术语
  • 语言和序列化

  • 规格

  • 浏览器解析器和内容MIME)类型
  • 浏览器支持

  • 验证程序和文档类型定义

  • 怪癖,有限的怪癖和标准



  • 术语



    描述此问题的困难之一很明显自从HTML首次引入以来,官方规范中的术语多年来一直在变化。以下内容基于HTML5术语。此外,文件被用作通用术语来表示文件,文档,输入流,八位字节流等,以避免必须做出明确的区分。

    语言和序列化



    HTML和XHTML是根据语言和序列化定义的。

    该语言定义了元素和属性及其内容模型的词汇表,即哪些元素允许在其中包含哪些元素,哪些元素在哪些元素上允许以及每个元素和属性的用途和含义。

    序列化定义了如何使用标记来描述文本文档中的这些元素和属性。这包括哪些标签是必需的,哪些可以推断,以及这些推断的规则。它描述了如何标记无效元素(例如>与/>)以及何时需要引用属性值。 HTML 4.01规范是定义HTML语言和HTML序列化的当前规范。

    XML 1.0规范定义了一个序列化,但其语言由其他规范定义,这些规范称为XML应用程序。

    XHTML 1.0和1.1规范都在使用中。本质上,它们使用与HTML 4.01相同的语言,但使用不同的序列化,与XML 1.0规范兼容。即XHTML是一个XML应用程序。

    HTML5(截至2010-04-18,草案)规范描述了HTML和XHTML的新语言。这种语言大部分是HTML 4.01语言的超集,但其目的只是为了向后兼容现有的网络工具(例如浏览器,搜索引擎和创作工具),而不是以前的规范,因为这些规范存在差异。所以有些元素的含义偶尔会从早期的规范中改变。类似地,每个序列化都与当前工具向后兼容。

    浏览器解析器和内容(MIME)类型



    <当一个文本文件被发送到浏览器时,它被解析成它的内部内存结构(对象模型)。为此,它使用遵循HTML序列化规则或XML序列化规则的解析器。根据内容类型HTTP标头上的非本地文件,它使用哪个分析器取决于它推断出的内容类型。在内部,一旦文件被解析,浏览器将以几乎相同的方式处理对象模型,而不管它是否最初使用HTML或XHTML序列化提供。



    对于浏览器使用其XHTML解析器,内容类型HTTP标头必须是其中一种XML内容类型。大多数情况下,这是 application / xml application / xhtml + xml 。任何非XML内容类型都将意味着该文件无论是否符合所有XHTML语言和序列化规则,都不会被浏览器作为XHTML处理。



    使用HTTP内容类型 text / html (或者在大多数后备方案中,内容类型丢失或其他任何非XML类型)将导致浏览器使用它的HTML序列化解析器。



    两个解析器之间的一个主要区别是HTML序列化解析器执行错误恢复。如果解析器的输入文件不符合HTML序列化规则,则解析器将以与以前的浏览器相反的方式进行恢复,并继续构建其对象模型,直到达到文件末尾。 HTML5包含恢复的第一个标准化定义,但没有主流浏览器在2010-04-26发布版本中提供了算法的实现。



    ,XML序列化语法分析器在遇到任何它不能解释为XML的时候(即,当它发现文件不是XML格式良好时)将停止。这是XML 1.0规范所要求的解析器​​。



    浏览器支持



    大多数现代浏览器都支持这两种一个HTML解析器和一个XML解析器。但是,在Microsoft Internet Explorer 8.0和更低版本中,XML解析器无法直接创建用于呈现为HTML页面的对象模型。然而,可以使用XSLT文件处理XML结构以创建一个流,然后使用HTML解析器分析该流,以创建可以呈现的对象模型。



    从Internet Explorer 9平台预览版开始,使用XML内容类型提供的XHTML可以像其他现代浏览器一样直接解析。



    当他们的XML解析器检测到他们的输入文件不是XML格式良好的,某些浏览器显示错误消息,而其他浏览器则显示页面构建到检测到错误的位置,并且一些浏览器为用户提供了使用HTML重新分析文件的机会解析器。
    $ b

    验证器和文档类型定义



    HTML和XHTML文件可以以文档类型定义(DTD )声明,它表示文档中正在使用的语言和序列化。验证程序(例如 http://validator.w3.org/ 中的验证程序)会使用此信息来匹配该语言并根据DTD中定义的规则在文件中使用序列化。然后,它会根据文件中的标记违反DTD中的规则来报告错误。



    并非所有的HTML序列化和语言规则都可以在DTD中描述,因此验证程序仅测试规范描述的所有规则的子集。



    HTML 4.01和XHTML 1.0定义了Strict,Transitional和Frameset DTD,它们在兼容文件中允许的语言元素和属性上有所不同。



    基于HTML5的验证程序(例如 validator.nu )的行为更像浏览器,根据处理页面到HTTP内容类型并使用非基于DTD的规则集,以便它们捕获DTD无法描述的错误。

    怪癖,有限的怪癖和标准模式。



    浏览器不会验证发送给他们的文件。他们也不使用任何DTD声明来确定文件的语言或序列化。但是,他们确实用它来猜测创建页面的时代,因此当时作者可能会期望浏览器的解析和呈现行为。因此,他们定义了三种解析和呈现模式,即奇怪模式,有限怪异(或几乎标准)模式和标准模式。

    使用XML内容类型总是在标准模式下处理。对于使用HTML解析器解析的文件,如果没有提供DTD或DTD被确定为非常旧,浏览器将使用它们的怪异模式。一般来说,处理为text / html的HTML 4.01和XHTML文件如果包含一个过渡DTD,并且使用严格DTD则采用标准模式进行处理,并使用有限的怪异模式处理。

    在DTD无法识别的地方,模式由一组复杂的规则决定。一种特殊情况是省略了公共和系统标识符,并且声明简单地<!DOCTYPE html> ;.这被称为是最短的文档类型声明,其中当前浏览器将文件视为标准模式。出于这个原因,它是指定用于HTML5兼容文件的声明。


    The topics of HTML vs. XHTML and XHTML as text/html vs. XHTML as XHTML are quite complex. Unfortunately it's hard to get a complete picture, since information is spread mostly in bits and pieces around the web or is buried deep in W3C tech jargon. In addition there's some misinformation being circulated. I propose to make this the definitive SO resource about the topic, describing the most important aspects of:

    • HTML 4
    • HTML 5
    • XHTML 1.0 as text/html, application/xml+xhtml
    • XHTML 1.1 as application/xml+xhtml

    What are the practical implications of each?
    What are common pitfalls?
    What is the importance of proper MIME types for each?
    How do different browsers handle them?

    I'd like to see one answer per technology. I'm making this a community wiki, so rather than contributing redundant answers, please edit answers to complete the picture. Feel free to start with stubs. Also feel free to edit this question.

    解决方案

    Contents.

    • Terminology
    • Languages and Serializations
    • Specifications
    • Browser Parsers and Content (MIME) Types
    • Browser Support
    • Validators and Document Type Definitions
    • Quirks, Limited Quirks, and Standards modes.

    Terminology

    One of the difficulties of describing this is clearly that the terminology within the official specifications has changed over the years, since HTML was first introduced. What follows below is based on HTML5 terminology. Also, "file" is used as a generic term to mean a file, document, input stream, octet stream, etc to avoid having to make fine distinctions.

    Languages and Serializations

    HTML and XHTML are defined in terms of a language and a serialization.

    The language defines the vocabulary of the elements and attributes, and their content model, i.e. which elements are permitted inside which other elements, which attributes are allowed on which element, along with the purpose and meaning of each element and attribute.

    The serialization defines how mark-up is used to describe these elements and attributes within a text document. This includes which tags are required and which can be inferred, and the rules for those inferences. It describes such things as how void elements should be marked up (e.g. ">" vs "/>") and when attribute values need to be quoted.

    Specifications

    The HTML 4.01 specification is the current specification that defines both the HTML language and the HTML serialization.

    The XML 1.0 specification defines a serialization but leaves the language to be defined by other specifications, which are termed "XML applications"

    The XHTML 1.0 and 1.1 specifications are both in use. Essentially, they use the same language as HTML 4.01 but use a different serialization, one that is compatible with the XML 1.0 specification. i.e. XHTML is an XML application.

    The HTML5 (as of 2010-04-18, draft) specification describes a new language for both HTML and XHTML. This language is mostly a superset of the HTML 4.01 language, but is intended to only be backward compatible with existing web tools, (e.g. browsers, search engines and authoring tools) and not with previous specifications, where differences arise. So the meaning of some elements are occasionally changed from the earlier specifications. Similarly, each of the serializations are backward compatible with the current tools.

    Browser Parsers and Content (MIME) Types

    When a text file is sent to a browser, it is parsed into its internal memory structure (object model). To do so it uses a parser which follows either the HTML serialization rules or XML serialization rules. Which parser it uses depends on what it deduces the content type to be, based for non-local files on the "content-type" HTTP header. Internally, once the file has been parsed, the browser treats the object model in almost the same way, regardless of whether it was originally supplied using an HTML or XHTML serialization.

    For a browser to use its XHTML parser, the content type HTTP header must be one of the XML content types. Most commonly, this is either application/xml or application/xhtml+xml. Any non XML content type will mean that the file, regardless of whether it meets all the XHTML language and serialization rules or not, will not be processed by the browser as XHTML.

    Using a HTTP content type of text/html (or in most fallback scenarios, where the content type is missing or any other non-XML type) will cause the browser to use its HTML serialization parser.

    One key difference between the two parsers is that the HTML serialization parser performs error recovery. If the input file to the parser does not meet the HTML serialization rules, the parser will recover in ways reverse engineered from previous browsers and carry on building its object model until it reaches the end of the file. HTML5 contains the first normative definition of the recovery but no mainstream browser has shipped an implementation of the algorithm enabled in a release version as of 2010-04-26.

    In contrast, the XML serialization parser, will stop when it encounters anything that it cannot interpret as XML (i.e. when it discovers that the file is not XML well-formed). This is required of parsers by the XML 1.0 specification.

    Browser Support

    Most modern browsers contain support for both an HTML parser and an XML parser. However, in Microsoft Internet Explorer versions 8.0 and earlier, the XML parser cannot directly create an object model for rendering as an HTML page. The XML structure can, however be processed with an XSLT file to create a stream which in turn be parsed using the HTML parser to create a object model that can be rendered.

    Starting with Internet Explorer 9 Platform Preview, XHTML supplied using an XML content type can be parsed directly in the same way as the other modern browsers.

    When their XML parsers detect that their input files are not XML well-formed, some browsers display an error message, and others show the page as constructed up to the point where the error was detected and some offer the user the opportunity to have the file re-parsed using their HTML parser.

    Validators and Document Type Definitions

    HTML and XHTML files can begin with a Document Type Definition (DTD) declaration which indicates the language and serialization that is being used in the document. Validators, such as the one at http://validator.w3.org/ use this information to match the language and serialization used within the file against the rules defined in the DTD. It then reports errors based on where the rules in the DTD are violated by mark up in the file.

    Not all HTML serialization and language rules can be described in a DTD, so validators only test for a subset of all the rules described by the specifications.

    HTML 4.01 and XHTML 1.0 define Strict, Transitional, and Frameset DTDs which differ in the language elements and attributes that are permitted in compliant files.

    Validators based on HTML5 such as validator.nu behave more like browsers, processing the page according to the HTTP content type and using a non DTD-based rule set so that they catch errors that cannot be described by DTDs.

    Quirks, Limited Quirks, and Standards modes.

    Browsers do not validate the files sent to them. Nor do they use any DTD declaration to determine the language or serialization of the file. However, they do use it to guess the era in which the page was created, and therefore the likely parsing and rendering behaviour the author would have expected of a browser at that time. Accordingly, they define three parsing and rendering modes, known as Quirks mode, Limited Quirks (or Almost Standards) mode and Standards mode.

    Any file served using an XML content type is always processed in standards mode. For files parsed using the HTML parser, if there is no DTD provided or the DTD is determined to be very old, browsers use their quirks mode. Broadly speaking, HTML 4.01 and XHTML files processed as text/html will be processed with limited quirks mode if they contain a transitional DTD and with standards mode if using a strict DTD.

    Where the DTD is not recognised, the mode is determined by a complex set of rules. One special case is where the public and system identifiers are omitted and the declaration is simply <!DOCTYPE html>. This is known to be the shortest doctype declaration where current browsers will treat the file as standards mode. For that reason, it is the declaration specified to be used for HTML5 compliant files.

    这篇关于HTML 4,HTML 5,XHTML,MIME类型 - 权威资源的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆