(C#) 如何在不加载或重写整个文件的情况下修改现有 XML 文件中的属性值? [英] (C#) How to modify attribute's value in the existing XML file without loading or rewriting the whole file?

查看:27
本文介绍了(C#) 如何在不加载或重写整个文件的情况下修改现有 XML 文件中的属性值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在 XmlWriter 和 Linq2Xml 的帮助下制作一些巨大的 XML 文件(几个 GB).此文件属于以下类型:

I'm making some huge XML files (several GB) with the help of XmlWriter and Linq2Xml. This files are of type:

<Table recCount="" recLength="">
<Rec recId="1">..</Rec>
<Rec recId="2">..</Rec>
..
<Rec recId="n">..</Rec>
</Table>

我不知道 Table 的 recCountrecLength 属性的值,直到我写完所有的内部 Rec节点,所以我必须在最后为这些属性写入值.

I don't know values for Table's recCount and recLength attributes until I write all the inner Rec nodes, so I have to write values to these attributes at the very end.

现在我正在将所有内部 Rec 节点写入临时文件,计算 Table 的属性值并按照我上面显示的方式编写所有内容到结果文件.(使用所有 Rec 节点从临时文件中复制所有内容)

Right now I'm writing all the inner Rec nodes to a temp file, calculate Table's attributes' values and write everything the way I've shown above to a resulting file. (copying everything from the temp file with all the Rec nodes)

我想知道是否有一种方法可以修改这些属性的值,而无需将内容写入另一个文件(就像我现在所做的那样)或将整个文档加载到内存中(这显然是不可能的,因为这些文件的大小)?

推荐答案

大量注释代码.基本思想是,在第一遍我们写:

Heavily commented code. The basic idea is that in the first pass we write:

<?xml version="1.0" encoding="utf-8"?>
<Table recCount="$1" recLength="$2">
<!--Reserved space:++++++++++++++++-->
<Rec...

然后我们回到文件的开头,重写前三行:

Then we go back to the beginning of the file and we rewrite the first three lines:

<?xml version="1.0" encoding="utf-8"?>
<Table recCount="1000" recLength="150">
<!--Reserved space:#############-->

这里的重要技巧"是你不能插入"到一个文件中,你只能覆盖它.因此,我们为数字保留"了一些空间(Reserved space:#############. 注释.我们可以用很多方法来做到这一点.. 例如,在第一遍中,我们可以有:

The important "trick" here is that you can't "insert" into a file, you can only overwrite it. So we "reserve" some space for the digits (the Reserved space:#############. comment. There are many many ways we could have done it... For example, in the first pass we could have:

<Table recCount="              " recLength="          ">

然后(xml合法但丑陋):

and then (xml-legal but ugly):

<Table recCount="1000          " recLength="150       ">

或者我们可以在表的>之后添加空格:

Or we could have appended the space after the > of Table:

<Table recCount="" recLength="">                   

(> 之后有 20 个空格)

(there are 20 spaces after the >)

那么:

<Table recCount="1000" recLength="150">            

(现在有13个空格>之后)

(now there are are 13 spaces after the >)

或者我们可以简单地在新行中添加不带 <!-- --> 的空格...

Or we could have simply added the spaces without the <!-- --> on a new line...

代码:

int maxRecCountLength = 10; // int.MaxValue.ToString().Length
int maxRecLengthLength = 10; // int.MaxValue.ToString().Length
int tokenLength = 4; // 4 == $1 + $2, see below what $1 and $2 are
// Note that the reserved space will be in the form +++++++++++++++++++

string reservedSpace = new string('+', maxRecCountLength + maxRecLengthLength - tokenLength); 

// You have to manually open the FileStream
using (var fs = new FileStream("out.xml", FileMode.Create))

// and add a StreamWriter on top of it
using (var sw = new StreamWriter(fs, Encoding.UTF8, 4096, true))
{
    // Here you write on your StreamWriter however you want.
    // Note that recCount and recLength have a placeholder $1 and $2.
    int recCount = 0;
    int maxRecLength = 0;

    using (var xw = XmlWriter.Create(sw))
    {
        xw.WriteWhitespace("\r\n");
        xw.WriteStartElement("Table");
        xw.WriteAttributeString("recCount", "$1");
        xw.WriteAttributeString("recLength", "$2");

        // You have to add some white space that will be 
        // partially replaced by the recCount and recLength value
        xw.WriteWhitespace("\r\n");
        xw.WriteComment("Reserved space:" + reservedSpace);

        // <--------- BEGIN YOUR CODE
        for (int i = 0; i < 100; i++)
        {
            xw.WriteWhitespace("\r\n");
            xw.WriteStartElement("Rec");

            string str = string.Format("Some number: {0}", i);
            if (str.Length > maxRecLength)
            {
                maxRecLength = str.Length;
            }
            xw.WriteValue(str);

            recCount++;

            xw.WriteEndElement();
        }
        // <--------- END YOUR CODE

        xw.WriteWhitespace("\r\n");
        xw.WriteEndElement();
    }

    sw.Flush();

    // Now we read the first lines to modify them (normally we will
    // read three lines, the xml header, the <Table element and the
    // <-- Reserved space:
    fs.Position = 0;

    var lines = new List<string>();

    using (var sr = new StreamReader(fs, sw.Encoding, false, 4096, true))
    {
        while (true)
        {
            string str = sr.ReadLine();
            lines.Add(str);

            if (str.StartsWith("<Table"))
            {
                // We read the next line, the comment line
                str = sr.ReadLine();
                lines.Add(str);
                break;
            }
        }
    }

    string strCount = XmlConvert.ToString(recCount);
    string strMaxRecLength = XmlConvert.ToString(maxRecLength);

    // We do some replaces for the tokens
    int oldLen = lines[lines.Count - 2].Length;
    lines[lines.Count - 2] = lines[lines.Count - 2].Replace("=\"$1\"", string.Format("=\"{0}\"", strCount));
    lines[lines.Count - 2] = lines[lines.Count - 2].Replace("=\"$2\"", string.Format("=\"{0}\"", strMaxRecLength));
    int newLen = lines[lines.Count - 2].Length;

    // Remove spaces from reserved whitespace
    lines[lines.Count - 1] = lines[lines.Count - 1].Replace(":" + reservedSpace, ":" + new string('#', reservedSpace.Length - newLen + oldLen));

    // We move back to just after the UTF8/UTF16 preamble
    fs.Position = sw.Encoding.GetPreamble().Length;

    // And we rewrite the lines
    foreach (string str in lines)
    {
        sw.Write(str);
        sw.Write("\r\n");
    }
}

较慢的 .NET 3.5 方式

在 .NET 3.5 中,StreamReader/StreamWriter 想要关闭基本的 FileStream,所以我必须多次重新打开文件.这有点慢.

In .NET 3.5 the StreamReader/StreamWriter want to close the base FileStream, so I have to reopen various times the file. This is a little little slower.

int maxRecCountLength = 10; // int.MaxValue.ToString().Length
int maxRecLengthLength = 10; // int.MaxValue.ToString().Length
int tokenLength = 4; // 4 == $1 + $2, see below what $1 and $2 are
                        // Note that the reserved space will be in the form +++++++++++++++++++

string reservedSpace = new string('+', maxRecCountLength + maxRecLengthLength - tokenLength);
string fileName = "out.xml";

int recCount = 0;
int maxRecLength = 0;

using (var sw = new StreamWriter(fileName))
{
    // Here you write on your StreamWriter however you want.
    // Note that recCount and recLength have a placeholder $1 and $2.
    using (var xw = XmlWriter.Create(sw))
    {
        xw.WriteWhitespace("\r\n");
        xw.WriteStartElement("Table");
        xw.WriteAttributeString("recCount", "$1");
        xw.WriteAttributeString("recLength", "$2");

        // You have to add some white space that will be 
        // partially replaced by the recCount and recLength value
        xw.WriteWhitespace("\r\n");
        xw.WriteComment("Reserved space:" + reservedSpace);

        // <--------- BEGIN YOUR CODE
        for (int i = 0; i < 100; i++)
        {
            xw.WriteWhitespace("\r\n");
            xw.WriteStartElement("Rec");

            string str = string.Format("Some number: {0}", i);
            if (str.Length > maxRecLength)
            {
                maxRecLength = str.Length;
            }
            xw.WriteValue(str);

            recCount++;

            xw.WriteEndElement();
        }
        // <--------- END YOUR CODE

        xw.WriteWhitespace("\r\n");
        xw.WriteEndElement();
    }
}

var lines = new List<string>();

using (var sr = new StreamReader(fileName))
{
    // Now we read the first lines to modify them (normally we will
    // read three lines, the xml header, the <Table element and the
    // <-- Reserved space:

    while (true)
    {
        string str = sr.ReadLine();
        lines.Add(str);

        if (str.StartsWith("<Table"))
        {
            // We read the next line, the comment line
            str = sr.ReadLine();
            lines.Add(str);
            break;
        }
    }
}

// We have to use the Stream overload of StreamWriter because
// we want to modify the text!
using (var fs = File.OpenWrite(fileName))
using (var sw = new StreamWriter(fs))
{
    string strCount = XmlConvert.ToString(recCount);
    string strMaxRecLength = XmlConvert.ToString(maxRecLength);

    // We do some replaces for the tokens
    int oldLen = lines[lines.Count - 2].Length;
    lines[lines.Count - 2] = lines[lines.Count - 2].Replace("=\"$1\"", string.Format("=\"{0}\"", strCount));
    lines[lines.Count - 2] = lines[lines.Count - 2].Replace("=\"$2\"", string.Format("=\"{0}\"", strMaxRecLength));
    int newLen = lines[lines.Count - 2].Length;

    // Remove spaces from reserved whitespace
    lines[lines.Count - 1] = lines[lines.Count - 1].Replace(":" + reservedSpace, ":" + new string('#', reservedSpace.Length - newLen + oldLen));

    // We move back to just after the UTF8/UTF16 preamble
    sw.BaseStream.Position = sw.Encoding.GetPreamble().Length;

    // And we rewrite the lines
    foreach (string str in lines)
    {
        sw.Write(str);
        sw.Write("\r\n");
    }
}

这篇关于(C#) 如何在不加载或重写整个文件的情况下修改现有 XML 文件中的属性值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆