如何防止.NET XML解析器扩展XML中的参数实体? [英] How do you keep .NET XML parsers from expanding parameter entities in XML?

查看:95
本文介绍了如何防止.NET XML解析器扩展XML中的参数实体?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我尝试解析下面的xml(带有下面的代码)时,我不断得到< sgml>& question;& signature;< / sgml>

When I try and parse the xml below (with code below) I keep getting <sgml>&question;&signature;</sgml>

扩展为

<sgml>Why couldn’t I publish my books directly in standard SGML? — William Shakespeare.</sgml>

OR

<sgml></sgml>

由于我正在研究XML三向合并算法,因此我想检索未扩展的
< sgml>& question;& signature;< / sgml>

Since I am working on an XML 3-way Merging algorithm I would like to retrieve the un-expanded <sgml>&question;&signature;</sgml>

I尝试过:


  • 正常解析xml(这会导致sgml标记扩展)

  • 删除xml开头的doctype会导致sgml标记为空)

  • 各种XmlReader DTD设置

我有以下XML文件:

<!DOCTYPE sgml [
  <!ELEMENT sgml ANY>
  <!ENTITY  std       "standard SGML">
  <!ENTITY  signature " &#x2014; &author;.">
  <!ENTITY  question  "Why couldn&#x2019;t I publish my books directly in &std;?">
  <!ENTITY  author    "William Shakespeare">
]>
<sgml>&question;&signature;</sgml>

这是我尝试过的代码(几次尝试):

Here is the code I have tried (several attempts):

using System.IO;
using System.Xml;
using System.Xml.Linq;
using System.Reflection;

class Program
{
    static void Main(string[] args)
    {
        string xml = @"C:\src\Apps\Wit\MergingAlgorithmTest\MergingAlgorithmTest\Tests\XMLMerge-DocTypeExpansion\DocTypeExpansion.0.xml";
        var xmlSettingsIgnore = new XmlReaderSettings 
            {
                CheckCharacters = false,
                DtdProcessing = DtdProcessing.Ignore
            };

        var xmlSettingsParse = new XmlReaderSettings
        {
            CheckCharacters = false,
            DtdProcessing = DtdProcessing.Parse
        };

        using (var fs = File.Open(xml, FileMode.Open, FileAccess.Read))
        {
            using (var xmkReaderIgnore = XmlReader.Create(fs, xmlSettingsIgnore))
            {
                // Prevents Exception "Reference to undeclared entity 'question'"
                PropertyInfo propertyInfo = xmkReaderIgnore.GetType().GetProperty("DisableUndeclaredEntityCheck", BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic);
                propertyInfo.SetValue(xmkReaderIgnore, true, null);

                var doc = XDocument.Load(xmkReaderIgnore);

                Console.WriteLine(doc.Root.ToString()); // outputs <sgml></sgml> not <sgml>&question;&signature;</sgml>
            }// using xml ignore

            fs.Position = 0;
            using (var xmkReaderIgnore = XmlReader.Create(fs, xmlSettingsParse))
            {
                var doc = XDocument.Load(xmkReaderIgnore);
                Console.WriteLine(doc.Root.ToString()); // outputs <sgml>Why couldn't I publish my books directly in standard SGML? - William Shakespeare.</sgml> not <sgml>&question;&signature;</sgml>
            }

            fs.Position = 0;
            string parseXmlString = String.Empty;
            using (StreamReader sr = new StreamReader(fs))
            {
                for (int i = 0; i < 7; ++i) // Skip DocType
                    sr.ReadLine();

                parseXmlString = sr.ReadLine();
            }

            using (XmlReader xmlReaderSkip = XmlReader.Create(new StringReader(parseXmlString),xmlSettingsParse))
            {
                // Prevents Exception "Reference to undeclared entity 'question'"
                PropertyInfo propertyInfo = xmlReaderSkip.GetType().GetProperty("DisableUndeclaredEntityCheck", BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic);
                propertyInfo.SetValue(xmlReaderSkip, true, null);

                var doc2 = XDocument.Load(xmlReaderSkip); // Empty sgml tag

            }
        }//using FileStream
    }
}


推荐答案

Linq-to-XML不支持实体引用的建模-它们会自动扩展为它们的值(源1 源2 )。根本没有 XObject 为一般实体引用定义。

Linq-to-XML does not support modeling of entity references -- they are automatically expanded to their values (source 1, source 2). There simply is no subclass of XObject defined for a general entity reference.

但是,假设您的XML有效(即(实体引用在DTD中存在,在您的示例中也是如此),则可以使用 old XML文档对象模型 来解析您的XML并插入 XmlEntityReference 节点插入XML DOM树中,而不是将实体引用扩展为纯文本:

However, assuming your XML is valid (i.e. the entity references exist in the DTD, which they do in your example) you can use the old XML Document Object Model to parse your XML and insert XmlEntityReference nodes into your XML DOM tree, rather than expanding the entity references into plain text:

        using (var sr = new StreamReader(xml))
        using (var xtr = new XmlTextReader(sr))
        {
            xtr.EntityHandling = EntityHandling.ExpandCharEntities; // Expands character entities and returns general entities as System.Xml.XmlNodeType.EntityReference
            var oldDoc = new XmlDocument();
            oldDoc.Load(xtr);
            Debug.WriteLine(oldDoc.DocumentElement.OuterXml); // Outputs <sgml>&question;&signature;</sgml>
            Debug.Assert(oldDoc.DocumentElement.OuterXml.Contains("&question;")); // Verify that the entity references are still there - no assert
            Debug.Assert(oldDoc.DocumentElement.OuterXml.Contains("&signature;")); // Verify that the entity references are still there - no assert
        }

href = https://msdn.microsoft.com/zh-cn/library/system.xml.xmlnode.childnodes%28v=vs.110%29.aspx rel = nofollow> ChildNodes每个 XmlEntityReference 具有普通实体的文本值。如果一个普通实体引用了其他普通实体,就像您的情况一样,相应的内部 XmlEntityReference 将嵌套在 ChildNodes 的外部。然后,您可以使用旧的 XmlDocument API比较新旧XML。

the ChildNodes of each XmlEntityReference will have the text value of the general entity. If a general entity refers to other general entities, as one does in your case, the corresponding inner XmlEntityReference will be nested in the ChildNodes of the outer. You can then compare the old and new XML using the old XmlDocument API.

请注意,您还需要使用旧的 XmlTextReader 并设置 EntityHandling = EntityHandling.ExpandCharEntities

Note you also need to use the old XmlTextReader and set EntityHandling = EntityHandling.ExpandCharEntities.

这篇关于如何防止.NET XML解析器扩展XML中的参数实体?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆