如果没有提供字符编码,HTML5是否为HTML文档指定了默认字符编码? [英] Does HTML5 specify a default character encoding for HTML documents if no character encoding is supplied?
问题描述
通过HTTP检索的示例HTML文档缺少:
- HTTP
Content-Type
header - HTML
< meta charset =< character encoding> />
- a HTML
< meta http-equiv ='Content-Type'content ='Type = text / html; charset =< character encoding>'>
示例UTF-8,假设为字符编码?
字符集是使用以下规则确定的:/ p>
- 用户覆盖。
- Content-Type字段。
- 字节顺序在HTML文档中的任何其他数据之前标记。
- 属性。
- 将http-equiv属性设置为Content-Type并为charset设置值的META声明。
- 未指定的启发式分析。
...然后...
- 根据Unicode技术标准#22中定义的字符集别名匹配规则规范化给定的字符编码字符串。
- 覆盖一些有问题的编码,编码好像是不同的编码。最常见的覆盖是将US-ASCII和ISO-8859-1视为Windows-1252,但此表中列出了其他一些编码覆盖。如规范所述,根据上表对某些编码作为其他编码进行处理的要求是对W3C字符模型规范的故意违反。
< blockquote>
但最重要的是:
strong>在每个 HTML文档上指定字符编码,或会发生错误。你可以用硬的方式(HTTP Content-Type头),简单的方法(
< meta http-equiv>
声明) $ c>< meta charset> 属性),但请这样做。网络感谢您。
资料来源:
- http:// blog。 whatwg.org/the-road-to-html-5-character-encoding
- http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing .html#determining-the-character-encoding
An example HTML document retrieved over HTTP lacks:
- a HTTP
Content-Type
header- a HTML
<meta charset="<character encoding>" />
- a HTML
<meta http-equiv='Content-Type' content='Type=text/html; charset=<character encoding>'>
With regards to HTML5, is a default, for example UTF-8, assumed as the character encoding? Or is it entirely up the application reading the HTML document to choose a default?
解决方案The charset is determined using these rules:
- User override.
- An HTTP "charset" parameter in a "Content-Type" field.
- A Byte Order Mark before any other data in the HTML document itself.
- A META declaration with a "charset" attribute.
- A META declaration with an "http-equiv" attribute set to "Content-Type" and a value set for "charset".
- Unspecified heuristic analysis.
...and then...
- Normalize the given character encoding string according to the Charset Alias Matching rules defined in Unicode Technical Standard #22.
- Override some problematic encodings, i.e. intentionally treat some encodings as if they were different encodings. The most common override is treating US-ASCII and ISO-8859-1 as Windows-1252, but there are several other encoding overrides listed in this table. As the specification notes, "The requirement to treat certain encodings as other encodings according to the table above is a willful violation of the W3C Character Model specification."
But the most important thing is:
You should always specify a character encoding on every HTML document, or bad things will happen. You can do it the hard way (HTTP Content-Type header), the easy way (
<meta http-equiv>
declaration), or the new way (<meta charset>
attribute), but please do it. The web thanks you.Sources:
- http://blog.whatwg.org/the-road-to-html-5-character-encoding
- http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#determining-the-character-encoding
这篇关于如果没有提供字符编码,HTML5是否为HTML文档指定了默认字符编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!