如果没有提供字符编码,HTML5是否为HTML文档指定了默认字符编码? [英] Does HTML5 specify a default character encoding for HTML documents if no character encoding is supplied?

查看:384
本文介绍了如果没有提供字符编码,HTML5是否为HTML文档指定了默认字符编码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

通过HTTP检索的示例HTML文档缺少:




  • HTTP Content-Type header

  • HTML < meta charset =< character encoding> />

  • a HTML < meta http-equiv ='Content-Type'content ='Type = text / html; charset =< character encoding>'>



示例UTF-8,假设为字符编码?

解决方案

字符集是使用以下规则确定的:



  1. 用户覆盖。

  2. Content-Type字段。

  3. 字节顺序在HTML文档中的任何其他数据之前标记。

  4. 属性。

  5. 将http-equiv属性设置为Content-Type并为charset设置值的META声明。

  6. 未指定的启发式分析。

...然后...


  1. 根据Unicode技术标准#22中定义的字符集别名匹配规则规范化给定的字符编码字符串。

  2. 覆盖一些有问题的编码,编码好像是不同的编码。最常见的覆盖是将US-ASCII和ISO-8859-1视为Windows-1252,但此表中列出了其他一些编码覆盖。如规范所述,根据上表对某些编码作为其他编码进行处理的要求是对W3C字符模型规范的故意违反。


< blockquote>

但最重要的是:


strong>在每个 HTML文档上指定字符编码,或会发生错误。你可以用硬的方式(HTTP Content-Type头),简单的方法(< meta http-equiv> 声明) $ c>< meta charset> 属性),但请这样做。网络感谢您。


资料来源:




An example HTML document retrieved over HTTP lacks:

  • a HTTP Content-Type header
  • a HTML <meta charset="<character encoding>" />
  • a HTML <meta http-equiv='Content-Type' content='Type=text/html; charset=<character encoding>'>

With regards to HTML5, is a default, for example UTF-8, assumed as the character encoding? Or is it entirely up the application reading the HTML document to choose a default?

解决方案

The charset is determined using these rules:

  1. User override.
  2. An HTTP "charset" parameter in a "Content-Type" field.
  3. A Byte Order Mark before any other data in the HTML document itself.
  4. A META declaration with a "charset" attribute.
  5. A META declaration with an "http-equiv" attribute set to "Content-Type" and a value set for "charset".
  6. Unspecified heuristic analysis.

...and then...

  1. Normalize the given character encoding string according to the Charset Alias Matching rules defined in Unicode Technical Standard #22.
  2. Override some problematic encodings, i.e. intentionally treat some encodings as if they were different encodings. The most common override is treating US-ASCII and ISO-8859-1 as Windows-1252, but there are several other encoding overrides listed in this table. As the specification notes, "The requirement to treat certain encodings as other encodings according to the table above is a willful violation of the W3C Character Model specification."

But the most important thing is:

You should always specify a character encoding on every HTML document, or bad things will happen. You can do it the hard way (HTTP Content-Type header), the easy way (<meta http-equiv> declaration), or the new way (<meta charset> attribute), but please do it. The web thanks you.

Sources:

这篇关于如果没有提供字符编码,HTML5是否为HTML文档指定了默认字符编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆