是否有可能在HTML属性中包含HTML文本或CDATA? [英] Is it possible to have HTML text or CDATA inside an XML attribute?

查看:125
本文介绍了是否有可能在HTML属性中包含HTML文本或CDATA?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我尝试将HTML文本或CDATA放入XML属性中时,我的分析器不断收到XML解析器失败:未终止的属性。有没有办法做到这一点,或这是标准不允许的?

解决方案

如果一个属性不是一个标记化或枚举类型,它被处理为CDATA。有关如何处理属性的详细信息可以在可扩展标记语言(XML)1.0(第五版版本)


3.3.1属性类型

XML属性类型为三种类:字符串类型,一组标记化类型和枚举类型。字符串类型可以将任何文字字符串作为值;标记化类型更受约束。语法中记录的有效性约束在属性值按照3.3.3属性 - 值标准化进行标准化之后应用。

  [54] AttType :: = StringType | TokenizedType | EnumeratedType 
[55] StringType :: ='CDATA'
[56] TokenizedType :: ='ID'[VC:ID]
[VC:每个元素类型一个ID]
[VC:ID属性默认值]
| 'IDREF'[VC:IDREF]
| 'IDREFS'[VC:IDREF]
| 'ENTITY'[VC:实体名称]
| '实体'[VC:实体名称]
| 'NMTOKEN'[VC:姓名令牌]
| 'NMTOKENS'[VC:Name Token]


...


3.3.3属性值标准化



在将属性的值传递给应用程序或检查有效性之前, XML处理器必须通过应用下面的算法来规范属性值,或者使用其他方法来传递给应用程序的值与算法产生的值相同。


  1. 所有换行符必须按照 2.11行尾处理,因此该算法的其余部分以这种方式对文本进行操作。

  2. 从标准化开始值由空字符串组成。

  3. 对于每个字符,实体引用或字符在非标准化属性值中引用,从第一个开始并继续到最后一个,执行以下操作:


    • 对于字符引用,将引用的字符追加到标准化值。
    • 对于实体引用,递归地将此算法的第3步应用于实体的替换文本。

    • 对于空格字符(#x20,#xD,#xA,#x9),将一个空格字符(#x20)附加到标准化值。

    • 对于另一个字符,将字符附加到标准化值。如果属性类型不是CDATA,那么XML处理器必须进一步处理它们。
    • 通过丢弃任何前导和尾随空格(#x20)字符,并通过用一个空格(#x20)字符替换空格序列(#x20)字符来标准化属性值。

      请注意,如果非标准化属性值包含对空格的字符引用除了空格(#x20)以外的字符,标准化值包含引用字符本身(#xD,#xA或#x9)。这与非规范化值包含空格字符(不是引用)的情况形成了对比,该空白字符在归一化值中被替换为空格字符(#x20),并且与非规范化值包含实体引用的情况形成对比替换文本包含一个空格字符;被递归处理时,空格字符被替换为标准化值中的空格字符(#x20)。



      没有声明被读取的所有属性应该被处理由非验证处理器处理,如同声明 CDATA



      如果属性值包含


#dt-entrefrel =noreferrer> reference
给一个没有读过声明的实体。

I keep getting "XML parser failure: Unterminated attribute" with my parser when I attempt to put HTML text or CDATA inside my XML attribute. Is there a way to do this or is this not allowed by the standard?

解决方案

If an attribute is not a tokenized or enumerated type, it is processed as CDATA. The details for how the attribute is processed can be found in the Extensible Markup Language (XML) 1.0 (Fifth Edition).

3.3.1 Attribute Types

XML attribute types are of three kinds: a string type, a set of tokenized types, and enumerated types. The string type may take any literal string as a value; the tokenized types are more constrained. The validity constraints noted in the grammar are applied after the attribute value has been normalized as described in 3.3.3 Attribute-Value Normalization.

[54]  AttType       ::=    StringType | TokenizedType | EnumeratedType
[55]  StringType    ::=    'CDATA'
[56]  TokenizedType ::=    'ID' [VC: ID]
            [VC: One ID per Element Type]
            [VC: ID Attribute Default]
        | 'IDREF'      [VC: IDREF]
        | 'IDREFS'     [VC: IDREF]
        | 'ENTITY'     [VC: Entity Name]
        | 'ENTITIES'   [VC: Entity Name]
        | 'NMTOKEN'    [VC: Name Token]
        | 'NMTOKENS'   [VC: Name Token]

...

3.3.3 Attribute-Value Normalization

Before the value of an attribute is passed to the application or checked for validity, the XML processor MUST normalize the attribute value by applying the algorithm below, or by using some other method such that the value passed to the application is the same as that produced by the algorithm.

  1. All line breaks MUST have been normalized on input to #xA as described in 2.11 End-of-Line Handling, so the rest of this algorithm operates on text normalized in this way.
  2. Begin with a normalized value consisting of the empty string.
  3. For each character, entity reference, or character reference in the unnormalized attribute value, beginning with the first and continuing to the last, do the following:
    • For a character reference, append the referenced character to the normalized value.
    • For an entity reference, recursively apply step 3 of this algorithm to the replacement text of the entity.
    • For a white space character (#x20, #xD, #xA, #x9), append a space character (#x20) to the normalized value.
    • For another character, append the character to the normalized value.

If the attribute type is not CDATA, then the XML processor MUST further process the normalized attribute value by discarding any leading and trailing space (#x20) characters, and by replacing sequences of space (#x20) characters by a single space (#x20) character.

Note that if the unnormalized attribute value contains a character reference to a white space character other than space (#x20), the normalized value contains the referenced character itself (#xD, #xA or #x9). This contrasts with the case where the unnormalized value contains a white space character (not a reference), which is replaced with a space character (#x20) in the normalized value and also contrasts with the case where the unnormalized value contains an entity reference whose replacement text contains a white space character; being recursively processed, the white space character is replaced with a space character (#x20) in the normalized value.

All attributes for which no declaration has been read SHOULD be treated by a non-validating processor as if declared CDATA.

It is an error if an attribute value contains a reference to an entity for which no declaration has been read.

这篇关于是否有可能在HTML属性中包含HTML文本或CDATA?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆