如果未提供字符编码,HTML5 是否为 HTML 文档指定默认字符编码? [英] Does HTML5 specify a default character encoding for HTML documents if no character encoding is supplied?

查看:40
本文介绍了如果未提供字符编码,HTML5 是否为 HTML 文档指定默认字符编码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

缺少通过 HTTP 检索的 HTML 文档示例:

  • 一个 HTTP Content-Type 标头
  • 一个 HTML
  • 一个 HTML <meta http-equiv='Content-Type' content='Type=text/html;charset=<字符编码>'>

对于 HTML5,是否默认(例如 UTF-8)作为字符编码?还是完全由读取 HTML 文档的应用程序来选择默认值?

解决方案

使用以下规则确定字符集:

<块引用>

  1. 用户覆盖.
  2. Content-Type"字段中的 HTTP字符集"参数.
  3. HTML 文档本身中任何其他数据之前的字节顺序标记.
  4. 带有字符集"属性的 META 声明.
  5. 将http-equiv"属性设置为Content-Type"并为charset"设置值的 META 声明.
  6. 未指定的启发式分析.

...然后...

  1. 根据 Unicode 技术标准 #22 中定义的字符集别名匹配规则对给定的字符编码字符串进行规范化.
  2. 覆盖一些有问题的编码,即有意将某些编码视为不同的编码.最常见的覆盖将 US-ASCII 和 ISO-8859-1 视为 Windows-1252,但此表中列出了其他几个编码覆盖.正如规范所指出的,根据上表将某些编码视为其他编码的要求是故意违反 W3C 字符模型规范的."

但最重要的是:

<块引用>

您应该总是每个 HTML 文档指定一个字符编码,否则坏事会发生.您可以采用困难的方式(HTTP Content-Type 标头)、简单的方式(<meta http-equiv> 声明)或新的方式(<meta charset> 属性),但请这样做.网络感谢您.

来源:

An example HTML document retrieved over HTTP lacks:

  • a HTTP Content-Type header
  • a HTML <meta charset="<character encoding>" />
  • a HTML <meta http-equiv='Content-Type' content='Type=text/html; charset=<character encoding>'>

With regards to HTML5, is a default, for example UTF-8, assumed as the character encoding? Or is it entirely up the application reading the HTML document to choose a default?

解决方案

The charset is determined using these rules:

  1. User override.
  2. An HTTP "charset" parameter in a "Content-Type" field.
  3. A Byte Order Mark before any other data in the HTML document itself.
  4. A META declaration with a "charset" attribute.
  5. A META declaration with an "http-equiv" attribute set to "Content-Type" and a value set for "charset".
  6. Unspecified heuristic analysis.

...and then...

  1. Normalize the given character encoding string according to the Charset Alias Matching rules defined in Unicode Technical Standard #22.
  2. Override some problematic encodings, i.e. intentionally treat some encodings as if they were different encodings. The most common override is treating US-ASCII and ISO-8859-1 as Windows-1252, but there are several other encoding overrides listed in this table. As the specification notes, "The requirement to treat certain encodings as other encodings according to the table above is a willful violation of the W3C Character Model specification."

But the most important thing is:

You should always specify a character encoding on every HTML document, or bad things will happen. You can do it the hard way (HTTP Content-Type header), the easy way (<meta http-equiv> declaration), or the new way (<meta charset> attribute), but please do it. The web thanks you.

Sources:

这篇关于如果未提供字符编码,HTML5 是否为 HTML 文档指定默认字符编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆