将XML字符实体强制到XmlDocument中 [英] Force XML character entities into XmlDocument

查看:183
本文介绍了将XML字符实体强制到XmlDocument中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些XML,如下所示:

 < abc x ={>< / abc> ; 

我想强制XmlDocument使用括号中的XML字符实体,即:

 < abc x =&#123;>< / abc> 

MSDN说:


为了分配一个包含实体引用的属性值

用户必须创建一个XmlAttribute节点
加上任何XmlText和
XmlEntityReference节点,构建
适当的子树,并使用
SetAttributeNode将其分配为属性的
值。


<一个href =http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.createentityreference.aspx =nofollow noreferrer> CreateEntityReference 听起来很有希望,所以我试过这个:

  XmlDocument doc = new XmlDocument(); 
doc.LoadXml(< abc />);
XmlAttribute x = doc.CreateAttribute(x);
x.AppendChild(doc.CreateEntityReference(#123));
doc.DocumentElement.Attributes.Append(x);

我收到异常无法创建一个名称为EntityReference以#开头。



CreateEntityReference不喜欢'#'的任何原因 - 更重要的是如何获取角色实体进入XmlDocument的XML?甚至有可能吗我希望避免使用OuterXml的字符串操作...

解决方案

你大部分是运气不好的。 >

首先,你正在处理的内容被称为Character References,这就是CreateEntityReference失败的原因。字符引用存在的唯一原因是提供对在给定上下文中是非法的或者其他难以创建的字符的访问。


定义:字符引用
是指
ISO / IEC 10646字符集中的特定字符,用于
示例一可以从可用的输入设备直接访问


请参阅XML规范的第4.1节



当XML处理器遇到字符引用时,如果它在属性的值中引用(即,如果在属性中使用& #xxx 格式),它被设置为Included,这意味着它的值被查找并且文本被替换。


字符串 ATamp; T; 扩展为
AT& T; ,其余的&符号为
确认为实体参考
分隔符


请参阅XML规范的第4.4节



这是烘焙到XML规范中,Microsoft XML栈正在执行所需要做的:进程字符参考。



最好的我可以看到你在做的是看看这些旧的XML.com文章,其中一个使用XSL来禁用输出转义,所以& amp;#123; 将在输出中变成&#123;

< a href =http://www.xml.com/pub/a/2001/03/14/trxml10.html =nofollow noreferrer> http://www.xml.com/pub/a/2001 /03/14/trxml10.html

 <!DOCTYPE stylesheet [
<!ENTITY ntilde
< xsl:text disable-output-escaping ='yes'>& amp; ntilde;< / xsl:text>>
]>

< xsl:stylesheet xmlns:xsl =http://www.w3.org/1999/XSL/Transform
version =1.0>

< xsl:output doctype-system =testOut.dtd/>

< xsl:template match =test>
< testOut>
西班牙语西班牙语是Espa& ntilde; a。
< xsl:apply-templates />
< / testOut>
< / xsl:template>

< / xsl:stylesheet>

这个使用XSL将特定字符引用转换为其他文本序列(实现相同目标如以前的链接)。

http://www.xml.com / lpt / a / 1426

 < xsl:stylesheet xmlns:xsl =http:// www。 w3.org/1999/XSL/Transform
version =2.0>

< xsl:output use-character-maps =cm1/>

< xsl:character-map name =cm1>
< xsl:output-character character =&#160;字符串= &放大器;放大器; NBSP;/>
< xsl:output-character character =&#233;字符串= &放大器;放大器; 233;/> <! - é - >
< xsl:output-character character =ôstring =& amp;#244;/>
< xsl:output-character character =&#8212;字符串= - />
< / xsl:character-map>

< xsl:template match =@ * | node()>
< xsl:copy>
< xsl:apply-templates select =@ * | node()/>
< / xsl:copy>
< / xsl:template>

< / xsl:stylesheet>


I have some XML that looks like this:

<abc x="{"></abc>

I want to force XmlDocument to use the XML character entities of the brackets, ie:

<abc x="&#123;"></abc>

MSDN says this:

In order to assign an attribute value that contains entity references, the user must create an XmlAttribute node plus any XmlText and XmlEntityReference nodes, build the appropriate subtree and use SetAttributeNode to assign it as the value of an attribute.

CreateEntityReference sounded promising, so I tried this:

XmlDocument doc = new XmlDocument();
doc.LoadXml("<abc />");
XmlAttribute x = doc.CreateAttribute("x");
x.AppendChild(doc.CreateEntityReference("#123"));
doc.DocumentElement.Attributes.Append(x);

And I get the exception Cannot create an 'EntityReference' node with a name starting with '#'.

Any reason why CreateEntityReference doesn't like the '#' - and more importantly how can I get the character entity into XmlDocument's XML? Is it even possible? I'm hoping to avoid string manipulation of the OuterXml...

解决方案

You're mostly out of luck.

First off, what you're dealing with are called Character References, which is why CreateEntityReference fails. The sole reason for a character reference to exist is to provide access to characters that would be illegal in a given context or otherwise difficult to create.

Definition: A character reference refers to a specific character in the ISO/IEC 10646 character set, for example one not directly accessible from available input devices.

(See section 4.1 of the XML spec)

When an XML processor encounters a character reference, if it is referenced in the value of an attribute (that is, if the &#xxx format is used inside an attribute), it is set to "Included" which means its value is looked up and the text is replaced.

The string "ATamp;T;" expands to " AT&T;" and the remaining ampersand is not recognized as an entity-reference delimiter

(See section 4.4 of the XML spec)

This is baked into the XML spec and the Microsoft XML stack is doing what it's required to do: process character references.

The best I can see you doing is to take a peek at these old XML.com articles, one of which uses XSL to disable output escaping so &amp;#123; would turn into &#123; in the output.
http://www.xml.com/pub/a/2001/03/14/trxml10.html

<!DOCTYPE stylesheet [
<!ENTITY ntilde 
"<xsl:text disable-output-escaping='yes'>&amp;ntilde;</xsl:text>">
]>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     version="1.0">

  <xsl:output doctype-system="testOut.dtd"/>

  <xsl:template match="test">
    <testOut>
      The Spanish word for "Spain" is "Espa&ntilde;a".
      <xsl:apply-templates/>
    </testOut>
  </xsl:template>

</xsl:stylesheet>

And this one which uses XSL to convert specific character references into other text sequences (to accomplish the same goal as the previous link).
http://www.xml.com/lpt/a/1426

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="2.0">

  <xsl:output use-character-maps="cm1"/>

  <xsl:character-map name="cm1">
    <xsl:output-character character="&#160;" string="&amp;nbsp;"/>   
    <xsl:output-character character="&#233;" string="&amp;233;"/> <!-- é -->
    <xsl:output-character character="ô" string="&amp;#244;"/>
    <xsl:output-character character="&#8212;" string="--"/>
  </xsl:character-map>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

这篇关于将XML字符实体强制到XmlDocument中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆