哪些字符必须在 HTML 5 中转义? [英] What characters must be escaped in HTML 5?

查看:25
本文介绍了哪些字符必须在 HTML 5 中转义?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

HTML 4 规定了应该是哪些字符逃脱:

HTML 4 states pretty which characters should be escaped:

四个字符实体引用值得特别提及,因为它们经常用于转义特殊字符:

Four character entity references deserve special mention since they are frequently used to escape special characters:

  • &lt;"表示 <签字.
  • "&gt;"表示 >签字.
  • "&"代表 &签字.
  • "&quot;代表标记.

作者希望把<"文本中的字符应使用&lt;"(ASCII 十进制 60)避免可能与标签的开头混淆(开始标签开放分隔符).同样,作者应该使用&gt;"(ASCII 十进制62) 在文本中而不是>"避免旧用户代理出现问题错误地将其视为标签的结尾(标签关闭delimiter) 当它出现在引用的属性值中时.

Authors wishing to put the "<" character in text should use "&lt;" (ASCII decimal 60) to avoid possible confusion with the beginning of a tag (start tag open delimiter). Similarly, authors should use "&gt;" (ASCII decimal 62) in text instead of ">" to avoid problems with older user agents that incorrectly perceive this as the end of a tag (tag close delimiter) when it appears in quoted attribute values.

作者应使用&amp;"(ASCII 十进制 38) 而不是&"避免与字符引用的开头混淆(实体引用开放分隔符).作者还应使用&"在属性值,因为在 CDATA 中允许字符引用属性值.

Authors should use "&amp;" (ASCII decimal 38) instead of "&" to avoid confusion with the beginning of a character reference (entity reference open delimiter). Authors should also use "&amp;" in attribute values since character references are allowed within CDATA attribute values.

一些作者使用字符实体引用&quot;";编码双引号 (") 的实例,因为该字符可能是用于分隔属性值.

Some authors use the character entity reference "&quot;" to encode instances of the double quote mark (") since that character may be used to delimit attribute values.

我很惊讶我在 HTML 5 中找不到这样的东西.在 grep 的帮助下,我能找到的唯一非 XML 提及是关于 不推荐使用的 XMP 元素:

I'm surprised I can't find anything like this in HTML 5. With the help of grep the only non-XML mention I could find comes as an aside regarding the deprecated XMP element:

使用 pre 和 code 代替,并转义<";和&"字符为&lt;"和&"分别.

Use pre and code instead, and escape "<" and "&" characters as "&lt;" and "&amp;" respectively.

能否在某种程度上指出有关此事的官方消息来源?

Could somewhat point to the official source on this matter?

推荐答案

规范定义普通元素的语法为:

普通元素可以包含文本、字符引用、其他元素和注释,但文本不得包含字符 U+003C LESS-THAN SIGN (<) 或不明确的与符号.除了内容模型和本段中描述的那些限制之外,一些普通元素还对它们被允许保存的内容有更多的限制.这些限制如下所述.

Normal elements can have text, character references, other elements, and comments, but the text must not contain the character U+003C LESS-THAN SIGN (<) or an ambiguous ampersand. Some normal elements also have yet more restrictions on what content they are allowed to hold, beyond the restrictions imposed by the content model and those described in this paragraph. Those restrictions are described below.

所以当后面跟着 任何可以开始字符引用的内容.&符号规则是引用属性的唯一此类规则,因为匹配的引号是唯一可以终止的规则.(显然,如果您不想在那里终止属性值,请转义引号.)

So you have to escape <, or & when followed by anything that could begin a character reference. The rule on ampersands is the only such rule for quoted attributes, as the matching quotation mark is the only thing that will terminate one. (Obviously, if you don’t want to terminate the attribute value there, escape the quotation mark.)

这些规则不适用于

登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆