(C#) 如何在不加载或重写整个文件的情况下修改现有 XML 文件中的属性值? [英] (C#) How to modify attribute's value in the existing XML file without loading or rewriting the whole file?
问题描述
我正在 XmlWriter 和 Linq2Xml 的帮助下制作一些巨大的 XML 文件(几个 GB).此文件属于以下类型:
I'm making some huge XML files (several GB) with the help of XmlWriter and Linq2Xml. This files are of type:
<Table recCount="" recLength="">
<Rec recId="1">..</Rec>
<Rec recId="2">..</Rec>
..
<Rec recId="n">..</Rec>
</Table>
我不知道 Table 的 recCount 和 recLength 属性的值,直到我写完所有的内部 Rec节点,所以我必须在最后为这些属性写入值.
I don't know values for Table's recCount and recLength attributes until I write all the inner Rec nodes, so I have to write values to these attributes at the very end.
现在我正在将所有内部 Rec 节点写入临时文件,计算 Table 的属性值并按照我上面显示的方式编写所有内容到结果文件.(使用所有 Rec 节点从临时文件中复制所有内容)
Right now I'm writing all the inner Rec nodes to a temp file, calculate Table's attributes' values and write everything the way I've shown above to a resulting file. (copying everything from the temp file with all the Rec nodes)
我想知道是否有一种方法可以修改这些属性的值,而无需将内容写入另一个文件(就像我现在所做的那样)或将整个文档加载到内存中(这显然是不可能的,因为这些文件的大小)?
推荐答案
大量注释代码.基本思想是,在第一遍我们写:
Heavily commented code. The basic idea is that in the first pass we write:
<?xml version="1.0" encoding="utf-8"?>
<Table recCount="$1" recLength="$2">
<!--Reserved space:++++++++++++++++-->
<Rec...
然后我们回到文件的开头,重写前三行:
Then we go back to the beginning of the file and we rewrite the first three lines:
<?xml version="1.0" encoding="utf-8"?>
<Table recCount="1000" recLength="150">
<!--Reserved space:#############-->
这里的重要技巧"是你不能插入"到一个文件中,你只能覆盖它.因此,我们为数字保留"了一些空间(Reserved space:#############.
注释.我们可以用很多方法来做到这一点.. 例如,在第一遍中,我们可以有:
The important "trick" here is that you can't "insert" into a file, you can only overwrite it. So we "reserve" some space for the digits (the Reserved space:#############.
comment. There are many many ways we could have done it... For example, in the first pass we could have:
<Table recCount=" " recLength=" ">
然后(xml合法但丑陋):
and then (xml-legal but ugly):
<Table recCount="1000 " recLength="150 ">
或者我们可以在表的>
之后添加空格:
Or we could have appended the space after the >
of Table:
<Table recCount="" recLength="">
(>
之后有 20 个空格)
(there are 20 spaces after the >
)
那么:
<Table recCount="1000" recLength="150">
(现在有13个空格在>
之后)
(now there are are 13 spaces after the >
)
或者我们可以简单地在新行中添加不带 <!-- -->
的空格...
Or we could have simply added the spaces without the <!-- -->
on a new line...
代码:
int maxRecCountLength = 10; // int.MaxValue.ToString().Length
int maxRecLengthLength = 10; // int.MaxValue.ToString().Length
int tokenLength = 4; // 4 == $1 + $2, see below what $1 and $2 are
// Note that the reserved space will be in the form +++++++++++++++++++
string reservedSpace = new string('+', maxRecCountLength + maxRecLengthLength - tokenLength);
// You have to manually open the FileStream
using (var fs = new FileStream("out.xml", FileMode.Create))
// and add a StreamWriter on top of it
using (var sw = new StreamWriter(fs, Encoding.UTF8, 4096, true))
{
// Here you write on your StreamWriter however you want.
// Note that recCount and recLength have a placeholder $1 and $2.
int recCount = 0;
int maxRecLength = 0;
using (var xw = XmlWriter.Create(sw))
{
xw.WriteWhitespace("\r\n");
xw.WriteStartElement("Table");
xw.WriteAttributeString("recCount", "$1");
xw.WriteAttributeString("recLength", "$2");
// You have to add some white space that will be
// partially replaced by the recCount and recLength value
xw.WriteWhitespace("\r\n");
xw.WriteComment("Reserved space:" + reservedSpace);
// <--------- BEGIN YOUR CODE
for (int i = 0; i < 100; i++)
{
xw.WriteWhitespace("\r\n");
xw.WriteStartElement("Rec");
string str = string.Format("Some number: {0}", i);
if (str.Length > maxRecLength)
{
maxRecLength = str.Length;
}
xw.WriteValue(str);
recCount++;
xw.WriteEndElement();
}
// <--------- END YOUR CODE
xw.WriteWhitespace("\r\n");
xw.WriteEndElement();
}
sw.Flush();
// Now we read the first lines to modify them (normally we will
// read three lines, the xml header, the <Table element and the
// <-- Reserved space:
fs.Position = 0;
var lines = new List<string>();
using (var sr = new StreamReader(fs, sw.Encoding, false, 4096, true))
{
while (true)
{
string str = sr.ReadLine();
lines.Add(str);
if (str.StartsWith("<Table"))
{
// We read the next line, the comment line
str = sr.ReadLine();
lines.Add(str);
break;
}
}
}
string strCount = XmlConvert.ToString(recCount);
string strMaxRecLength = XmlConvert.ToString(maxRecLength);
// We do some replaces for the tokens
int oldLen = lines[lines.Count - 2].Length;
lines[lines.Count - 2] = lines[lines.Count - 2].Replace("=\"$1\"", string.Format("=\"{0}\"", strCount));
lines[lines.Count - 2] = lines[lines.Count - 2].Replace("=\"$2\"", string.Format("=\"{0}\"", strMaxRecLength));
int newLen = lines[lines.Count - 2].Length;
// Remove spaces from reserved whitespace
lines[lines.Count - 1] = lines[lines.Count - 1].Replace(":" + reservedSpace, ":" + new string('#', reservedSpace.Length - newLen + oldLen));
// We move back to just after the UTF8/UTF16 preamble
fs.Position = sw.Encoding.GetPreamble().Length;
// And we rewrite the lines
foreach (string str in lines)
{
sw.Write(str);
sw.Write("\r\n");
}
}
较慢的 .NET 3.5 方式
在 .NET 3.5 中,StreamReader
/StreamWriter
想要关闭基本的 FileStream
,所以我必须多次重新打开文件.这有点慢.
In .NET 3.5 the StreamReader
/StreamWriter
want to close the base FileStream
, so I have to reopen various times the file. This is a little little slower.
int maxRecCountLength = 10; // int.MaxValue.ToString().Length
int maxRecLengthLength = 10; // int.MaxValue.ToString().Length
int tokenLength = 4; // 4 == $1 + $2, see below what $1 and $2 are
// Note that the reserved space will be in the form +++++++++++++++++++
string reservedSpace = new string('+', maxRecCountLength + maxRecLengthLength - tokenLength);
string fileName = "out.xml";
int recCount = 0;
int maxRecLength = 0;
using (var sw = new StreamWriter(fileName))
{
// Here you write on your StreamWriter however you want.
// Note that recCount and recLength have a placeholder $1 and $2.
using (var xw = XmlWriter.Create(sw))
{
xw.WriteWhitespace("\r\n");
xw.WriteStartElement("Table");
xw.WriteAttributeString("recCount", "$1");
xw.WriteAttributeString("recLength", "$2");
// You have to add some white space that will be
// partially replaced by the recCount and recLength value
xw.WriteWhitespace("\r\n");
xw.WriteComment("Reserved space:" + reservedSpace);
// <--------- BEGIN YOUR CODE
for (int i = 0; i < 100; i++)
{
xw.WriteWhitespace("\r\n");
xw.WriteStartElement("Rec");
string str = string.Format("Some number: {0}", i);
if (str.Length > maxRecLength)
{
maxRecLength = str.Length;
}
xw.WriteValue(str);
recCount++;
xw.WriteEndElement();
}
// <--------- END YOUR CODE
xw.WriteWhitespace("\r\n");
xw.WriteEndElement();
}
}
var lines = new List<string>();
using (var sr = new StreamReader(fileName))
{
// Now we read the first lines to modify them (normally we will
// read three lines, the xml header, the <Table element and the
// <-- Reserved space:
while (true)
{
string str = sr.ReadLine();
lines.Add(str);
if (str.StartsWith("<Table"))
{
// We read the next line, the comment line
str = sr.ReadLine();
lines.Add(str);
break;
}
}
}
// We have to use the Stream overload of StreamWriter because
// we want to modify the text!
using (var fs = File.OpenWrite(fileName))
using (var sw = new StreamWriter(fs))
{
string strCount = XmlConvert.ToString(recCount);
string strMaxRecLength = XmlConvert.ToString(maxRecLength);
// We do some replaces for the tokens
int oldLen = lines[lines.Count - 2].Length;
lines[lines.Count - 2] = lines[lines.Count - 2].Replace("=\"$1\"", string.Format("=\"{0}\"", strCount));
lines[lines.Count - 2] = lines[lines.Count - 2].Replace("=\"$2\"", string.Format("=\"{0}\"", strMaxRecLength));
int newLen = lines[lines.Count - 2].Length;
// Remove spaces from reserved whitespace
lines[lines.Count - 1] = lines[lines.Count - 1].Replace(":" + reservedSpace, ":" + new string('#', reservedSpace.Length - newLen + oldLen));
// We move back to just after the UTF8/UTF16 preamble
sw.BaseStream.Position = sw.Encoding.GetPreamble().Length;
// And we rewrite the lines
foreach (string str in lines)
{
sw.Write(str);
sw.Write("\r\n");
}
}
这篇关于(C#) 如何在不加载或重写整个文件的情况下修改现有 XML 文件中的属性值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!