XDocument.Save()删除我和放大器; #xA;实体 [英] XDocument.Save() removes my 
 entities

查看:153
本文介绍了XDocument.Save()删除我和放大器; #xA;实体的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了修复一些XML文件的工具(即插入一些属性/值缺少)使用C#和LINQ到XML。该工具加载现有XML文件到一个XDocument对象。然后,它解析向下穿过节点插入缺失的数据。 。在此之后,它会调用XDocument.Save()来进行更改保存到另一个目录

I wrote a tool to repair some XML files (i.e., insert some attributes/values that were missing) using C# and Linq-to-XML. The tool loads an existing XML file into an XDocument object. Then, it parses down through the node to insert the missing data. After that, it calls XDocument.Save() to save the changes out to another directory.

所有这一切,除了一件事就好:任何的&放; #xA; 是在XML文件中的文本实体被替换为新行字符。实体表示一个新行,当然,但我需要保存实体中的XML,因为另一个消费者需要它在那里。

All of that is just fine except for one thing: any 
 entities that are in the text in the XML file are replaced with a new line character. The entity represents a new line, of course, but I need to preserve the entity in the XML because another consumer needs it in there.

有什么办法拯救修改后的XDocument不失的&放大器; #xA; 实体

Is there any way to save the modified XDocument without losing the 
 entities?

感谢您

推荐答案

&放大器; #xA; 实体在技术上被称为数字字符引用XML中,他们都解决时,原来的文件被加载到的XDocument 。这使得您的问题问题解决了,因为没有从不重要的空白(通常用于格式化为纯文本查看器XML文档)后,区别解决空白实体的方式的XDocument 已被加载。因此,如果您的文档没有任何不重要的空白以下才适用。

The 
 entities are technically called "numeric character references" in XML, and they are resolved when the original document is loaded into the XDocument. This makes your issue problematic to solve, since there is no way of distinguishing resolved whitespace entities from insignificant whitespace (typically used for formatting XML documents for plain-text viewers) after the XDocument has been loaded. Thus, the below only applies if your document does not have any insignificant whitespace.

的System.Xml 库允许一个接设置 NewLineHandling XmlWriterSettings 类的属性 Entitize 。但是,文本节点中,这只会entitize \r &放大器; #xD; ,而不是 \\\
&放大器; #xA;

The System.Xml library allows one to preserve whitespace entities by setting the NewLineHandling property of the XmlWriterSettings class to Entitize. However, within text nodes, this would only entitize \r to 
, and not \n to 
.

最简单的方法是从的XmlWriter 类派生并重写它的 WriteString 方法手动与他们的数字字符实体替换空格字符。在 WriteString 办法也恰好是在哪里.NET实体化是不允许出现在文本节点字符,如语法标记&放的地方; < > ,分别实体化,以&放大器;放大器; &放大器; LT; &放大器; GT;

The easiest solution is to derive from the XmlWriter class and override its WriteString method to manually replace the whitespace characters with their numeric character entities. The WriteString method also happens to be the place where .NET entitizes characters that are not permitted to appear in text nodes, such as the syntax markers &, <, and >, which are respectively entitized to &amp;, &lt;, and &gt;.

由于的XmlWriter 是抽象的,我们应该从的XmlTextWriter 为了避免不必实现前级的所有的抽象方法。这里是一个快速和肮脏的实现:

Since XmlWriter is abstract, we shall derive from XmlTextWriter in order to avoid having to implement all the abstract methods of the former class. Here is a quick-and-dirty implementation:

public class EntitizingXmlWriter : XmlTextWriter
{
    public EntitizingXmlWriter(TextWriter writer) :
        base(writer)
    { }

    public override void WriteString(string text)
    {
        foreach (char c in text)
        {
            switch (c)
            {
                case '\r':
                case '\n':
                case '\t':
                    base.WriteCharEntity(c);
                    break;
                default:
                    base.WriteString(c.ToString());
                    break;
            }
        }
    }
}

如果用于在生产环境中使用,你会想要做废除了 c.ToString()的一部分,因为它是非常低效的。您可以通过配料的原始的文本字符串优化代码不包含任何要entitize的人物,共同喂养成一个单一的 base.WriteString 呼叫

If intended for use in a production environment, you’d want to do away with the c.ToString() part, since it’s very inefficient. You can optimize the code by batching substrings of the original text that do not contain any of the characters you want to entitize, and feeding them together into a single base.WriteString call.

一句警告:以下幼稚的做法是行不通的,因为基 WriteString 方法将取代任何&放大器;以字符&放大器;放大器; ,从而导致 \r 将扩大到&放大器;放大器; #xA;

A word of warning: The following naive implementation will not work, since the base WriteString method would replace any & characters with &amp;, thereby causing \r to be expanded to &amp;#xA;.

    public override void WriteString(string text)
    {
        text = text.Replace("\r", "&#xD;");
        text = text.Replace("\n", "&#xA;");
        text = text.Replace("\t", "&#x9;");
        base.WriteString(text);
    }



最后,保存您的的XDocument 到目标文件或流,只需使用下面的代码片段:使用

Finally, to save your XDocument into a destination file or stream, just use the following snippet:

using (var textWriter = new StreamWriter(destination))
using (var xmlWriter = new EntitizingXmlWriter(textWriter))
    document.Save(xmlWriter);



希望这有助于!

Hope this helps!

修改:作为参考,这里是覆盖 WriteString 方法的优化版本:

Edit: For reference, here is an optimized version of the overridden WriteString method:

public override void WriteString(string text)
{
    // The start index of the next substring containing only non-entitized characters.
    int start = 0;

    // The index of the current character being checked.
    for (int curr = 0; curr < text.Length; ++curr)
    {
        // Check whether the current character should be entitized.
        char chr = text[curr];
        if (chr == '\r' || chr == '\n' || chr == '\t')
        {
            // Write the previous substring of non-entitized characters.
            if (start < curr)
                base.WriteString(text.Substring(start, curr - start));

            // Write current character, entitized.
            base.WriteCharEntity(chr);

            // Next substring of non-entitized characters tentatively starts
            // immediately beyond current character.
            start = curr + 1;
        }
    }

    // Write the trailing substring of non-entitized characters.
    if (start < text.Length)
        base.WriteString(text.Substring(start, text.Length - start));
}

这篇关于XDocument.Save()删除我和放大器; #xA;实体的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆