哪些是HTML和XML特殊字符? [英] Which are the HTML, and XML, special characters?

查看:136
本文介绍了哪些是HTML和XML特殊字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

HTML和XML中有哪些特殊的保留字符实体?

What are the special reserved character entities in HTML and in XML?

我所说的信息:

HTML:


  • & (替换为& amp;

  • < (替换为& lt;

  • > (替换为& gt;

  • (替换为& quot;

  • ' (替换为&

  • & (replace with &amp;)
  • < (replace with &lt;)
  • > (replace with &gt;)
  • " (replace with &quot;)
  • ' (replace with &apos;)

XML:


  • < (替换为& lt;

  • > (替换为& gt;

  • & (替换为& amp;

  • ' (替换为& apos ;

  • (替换为& quot;
  • < (replace with &lt;)
  • > (replace with &gt;)
  • & (replace with &amp;)
  • ' (replace with &apos;)
  • " (replace with &quot;)

但我无法找到其中任何一个的文档。

But i cannot find documentation on either of these.

W3C确实提到了可扩展标记语言(XML)1.0(第五版),某些预定义实体引用。但它表示这些实体是预定义的(与& copy; 预定义的方式相同);并不是说它们必须被转义:

The W3C does mention, in Extensible Markup Language (XML) 1.0 (Fifth Edition), certain predefined entity references. But it says that these entities are predefined (in the same way that &copy; is predefined); not that they must be escaped:


4.6预定义实体



[定义:实体和字符引用都可以用于
转义左尖括号,&符号和其他分隔符。为此
目的指定了一组
的通用实体(amp,lt,gt,apos,quot)。也可以使用数字字符引用;它们是
,在被识别后立即展开,必须被视为字符
数据,因此数字字符引用&#60;和&#38;可以
用于逃脱<和&当它们出现在角色数据中时。]

4.6 Predefined Entities

[Definition: Entity and character references may both be used to escape the left angle bracket, ampersand, and other delimiters. A set of general entities (amp, lt, gt, apos, quot) is specified for this purpose. Numeric character references may also be used; they are expanded immediately when recognized and must be treated as character data, so the numeric character references " &#60; " and " &#38; " may be used to escape < and & when they occur in character data.]

必须将哪些字符转义为中的实体引用HTML

哪些字符必须转义为 XML 中的实体引用?

What characters must be escaped into entity references in HTML?
What characters must be escaped into entity references in XML?

更新

来自可扩展标记语言(XML)1.0(第五版)


2.4字符数据和标记



&符号(& )和左尖括号(< 一定不能
以字面形式出现,除非用作标记分隔符,
或注释,处理指令或CDATA部分。

如果
他们在其他地方需要,他们必须使用eithe进行转义r数字
字符引用或字符串& amp; & lt; 分别为

2.4 Character Data and Markup

The ampersand character (&) and the left angle bracket (<) must not appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section.
If they are needed elsewhere, they must be escaped using either numeric character references or the strings "&amp;" and "&lt;" respectively.

可以使用
字符串<$ c来表示直角括号(> ) $ c>& gt; 和必须,为了兼容性,请使用
& gt; 或字符串引用时出现在字符串]]> 内容中的
,当该字符串未标记结束时一个CDATA
部分。

The right angle bracket (>) may be represented using the string "&gt;", and must, for compatibility, be escaped using either "&gt;" or a character reference when it appears in the string "]]>" in content, when that string is not marking the end of a CDATA section.

允许属性值包含单引号和双引号,撇号或单引号字符( ')可以表示为& ,以及双引号字符()为& quot;

To allow attribute values to contain both single and double quotes, the apostrophe or single-quote character (') may be represented as "&apos;", and the double-quote character (") as "&quot;".

i读前者说

必须


  • < & lt; )必须

  • & & amp; )必须

  • < (&lt;) must be
  • & (&amp;) must be

可能,但必须当显示为]]>


  • 如果显示为,则>

  • > (&gt;) must be, if appearing as ]]>

'根本不需要逃脱;除非你想在引用属性中加引号。

And that ' and " don't have to be escaped at all; unless you want to have quotes inside quoted attributes.

来自 HTML 4.01规范,HTML文件表示


5.3.2字符实体引用



希望在文本中加上< 字符的作者应使用& lt;
(ASCII十进制60)以避免可能与
标记的开头混淆(开始标记打开分隔符)。

5.3.2 Character entity references

Authors wishing to put the "<" character in text should use "&lt;" (ASCII decimal 60) to avoid possible confusion with the beginning of a tag (start tag open delimiter).

同样,作者应使用& gt;
(ASCII十进制62),而不是> ,以避免出现问题使用较旧的
用户代理,当它出现在带引号的属性值中时,会错误地将其视为标记的结尾(标记为
close delimiter)。

Similarly, authors should use "&gt;" (ASCII decimal 62) in text instead of ">" to avoid problems with older user agents that incorrectly perceive this as the end of a tag (tag close delimiter) when it appears in quoted attribute values.

作者应使用& amp; (ASCII十进制38)而不是& 避免
与字符引用的开头混淆(实体
引用打开分隔符)。作者还应在
属性值中使用& amp; ,因为CDATA
属性值中允许使用字符引用。

Authors should use "&amp;" (ASCII decimal 38) instead of "&" to avoid confusion with the beginning of a character reference (entity reference open delimiter). Authors should also use "&amp;" in attribute values since character references are allowed within CDATA attribute values.

有些作者使用字符实体引用& quot; 来编码双引号的
实例()因为该字符可能是
用于分隔属性值。

Some authors use the character entity reference "&quot;" to encode instances of the double quote mark (") since that character may be used to delimit attribute values.

HTML在规则上更加多愁善感,但听起来我应该

HTML is much more wishy-washy on the rules, but it sounds like i should:


  • < 应与& lt;

  • > 应与& gt;

  • & 应与& amp;

  • 应该是& quot;

  • < should be with &lt;
  • > should be with &gt;
  • & should be with &amp;
  • " should be with &quot;

以及可以是实体参考,我也应该用& amp; ' >。

and if " can be an entity reference, i should also replace ' with &amp;.

来自 HTML5 - 词汇表和相关API对于HTML和XHTML


8.3序列化HTML片段



转义字符串(用于此目的) (上述算法)包含运行以下步骤的

8.3 Serializing HTML fragments

Escaping a string (for the purposes of the algorithm above) consists of running the following steps:

替换任何出现的& 字符串中的字符& amp;

Replace any occurrence of the "&" character by the string "&amp;".

替换任何出现的U + 00A0
字符串& nbsp; 的NO-BREAK SPACE字符。

Replace any occurrences of the U+00A0 NO-BREAK SPACE character by the string "&nbsp;".

如果算法在属性模式下调用,用字符串& quot; 出现的字符c $ c>。

If the algorithm was invoked in the attribute mode, replace any occurrences of the """ character by the string "&quot;".

如果在属性模式下未调用算法,则替换任何
occurren字符串& lt; 和任何
< 字符的ces通过字符串& gt; 出现> 字符。

If the algorithm was not invoked in the attribute mode, replace any occurrences of the "<" character by the string "&lt;", and any occurrences of the ">" character by the string "&gt;".

我读作 HTML


  • & by & amp; 总是

  • & nbsp; 始终

  • by & quot; 如果它在属性中

  • < by & lt; 如果属性中(即属性可以包含<

  • > by & gt; 如果属性中(即属性可以包含>

  • & by &amp; always
  •   by &nbsp; always
  • " by &quot; if it's inside an attribute
  • < by &lt; if it's not in an attribute (i.e. attributes can contain <)
  • > by &gt; if it's not in an attribute (i.e. attributes can contain >)

推荐答案

首先,您要比较 HTML 4.01规范,带有 HTML 5 one 。 HTML5与XML的关系比HTML 4.01更紧密(这就是为什么我们有XHTML),所以这个答案将坚持HTML 5和XML。

First, you're comparing a HTML 4.01 specification with an HTML 5 one. HTML5 ties more closely in with XML than HTML 4.01 ever does (that's why we have XHTML), so this answer will stick to HTML 5 and XML.

你引用的参考文献是以下几点一致:

Your quoted references are all consistent on the following points:


  • < 应始终用& lt; 未指明处理指示时

  • > 应始终如果没有指明处理指令,则以& gt; 表示

  • & & amp;

  • 表示除外><![CDATA []]> (仅适用于XML)

  • < should always be represented with &lt; when not indicating a processing instruction
  • > should always be represented with &gt; when not indicating a processing instruction
  • & should always be represented with &amp;
  • except when within <![CDATA[ ]]> (which only applies to XML)

我同意100%与此。您永远不希望解析器将文字误认为是指令,因此始终对任何非空格(见下文)字符进行编码是一个坚实的想法。好的解析器知道<![CDATA []]> 中包含的任何内容都不是指令,因此在那里不需要编码。

I agree 100% with this. You never want the parser to mistake literals for instructions, so it's a solid idea to always encode any non-space (see below) character. Good parsers know that anything contained within <![CDATA[ ]]> are not instructions, so the encoding is not necessary there.

在实践中,我从不编码',除非

In practice, I never encode ' or " unless


  • 它出现在属性(XML或HTML)的值中

  • 它出现在XML标签的文本中。(< tag>& quot; Yoinks!& quot;,他说。< / tag>

  • it appears within the value of an attribute (XML or HTML)
  • it appears within the text of XML tags. (<tag>&quot;Yoinks!&quot;, he said.</tag>)

这两个规范也同意这一点。

Both specifications also agree with this.

因此,唯一的争论点是 (空格)。在任何一个规范中唯一提到的就是尝试序列化时。如果没有,你应该总是使用文字 (空格)。除非你是编写自己的解析器,我认为没有必要进行任何类型的序列化,所以这不是重点。

So, the only point of contention is the (space). The only mention of it in either specification is when serialization is attempted. When not, you should always use a literal (space). Unless you are writing your own parser, I don't see the need to be doing any kind of serialization, so this is beside the point.

这篇关于哪些是HTML和XML特殊字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆