拆分XML文档,从重复元素创建多个输出文件 [英] Split XML document apart creating multiple output files from repeating elements
问题描述
我需要获取一个XML文件,并从输入文件的重复节点创建多个输出xml文件。源文件 AnimalBatch.xml如下所示:
I need to take an XML file and create multiple output xml files from the repeating nodes of the input file. The source file "AnimalBatch.xml" looks like this:
<?xml version = 1.0 encoding = utf-8吗? >
<动物>
<动物id = 1001>
< Quantity>一个< / Quantity>
<形容词>红色< /形容词>
<名称>公鸡< /名称>
< / Animal>
< Animal id = 1002>
< Quantity> Two< / Quantity>
<形容词>顽固性< /形容词>
<名称>驴< /名称>
< / Animal>
<动物id = 1003 >
< Quantity>三< / Quantity>
< Color>盲< / Color>
< Name>小鼠< / Name>
< / Animal>
< / Animals>
<?xml version="1.0" encoding="utf-8" ?>
<Animals>
<Animal id="1001">
<Quantity>One</Quantity>
<Adjective>Red</Adjective>
<Name>Rooster</Name>
</Animal>
<Animal id="1002">
<Quantity>Two</Quantity>
<Adjective>Stubborn</Adjective>
<Name>Donkeys</Name>
</Animal>
<Animal id="1003">
<Quantity>Three</Quantity>
<Color>Blind</Color>
<Name>Mice</Name>
</Animal>
</Animals>
该程序需要拆分重复的动物并产生3个文件,分别是Animal_1001.xml,Animal_1002.xml和Animal_1003.xml
The program needs to split the repeating "Animal" and produce 3 files named: Animal_1001.xml, Animal_1002.xml, and Animal_1003.xml
每个输出文件应仅包含其各自的元素(将作为根)。 AnimalsBatch.xml的id属性将提供Animal_xxxx.xml文件名的序列号。
Each output file should contain just their respective element (which will be the root). The id attribute from AnimalsBatch.xml will supply the sequence number for the Animal_xxxx.xml filenames. The id attribute does not need to be in the output files.
Animal_1001.xml:
<?xml version = 1.0 encoding = utf-8?>
<动物> ;
< Quantity>一个< / Quantity>
<形容词>红色//形容词>
<名称"公鸡< /名称>
< / Animal>
Animal_1001.xml:
<?xml version="1.0" encoding="utf-8"?>
<Animal>
<Quantity>One</Quantity>
<Adjective>Red</Adjective>
<Name>Rooster</Name>
</Animal>
Animal_1002.xml
<?xml version = 1.0 encoding = utf-8?>
<动物>
< Quantity> Two< / Quantity>
<形容词>顽固的< /形容词>
<名称>驴< /名称> ;
< / Animal>
Animal_1002.xml
<?xml version="1.0" encoding="utf-8"?>
<Animal>
<Quantity>Two</Quantity>
<Adjective>Stubborn</Adjective>
<Name>Donkeys</Name>
</Animal>
Animal_1003.xml>
<?xml version = 1.0 encoding = utf-8?>
<动物>
< Quantity> T hree< / Quantity>
<形容词&Blind< /形容词>
<名称>小鼠< /名称>
< /动物>
Animal_1003.xml>
<?xml version="1.0" encoding="utf-8"?>
<Animal>
<Quantity>Three</Quantity>
<Adjective>Blind</Adjective>
<Name>Mice</Name>
</Animal>
我想用XmlDocument做到这一点,因为它需要能够在.Net 2.0上运行。
I want to do this with XmlDocument, since it needs to be able to run on .Net 2.0.
我的程序如下所示:
static void Main(string[] args)
{
string strFileName;
string strSeq;
XmlDocument doc = new XmlDocument();
doc.Load("D:\\Rick\\Computer\\XML\\AnimalBatch.xml");
XmlNodeList nl = doc.DocumentElement.SelectNodes("Animal");
foreach (XmlNode n in nl)
{
strSeq = n.Attributes["id"].Value;
XmlDocument outdoc = new XmlDocument();
XmlNode rootnode = outdoc.CreateNode("element", "Animal", "");
outdoc.AppendChild(rootnode); // Put the wrapper element into outdoc
outdoc.ImportNode(n, true); // place the node n into outdoc
outdoc.AppendChild(n); // This statement errors:
// "The node to be inserted is from a different document context."
strFileName = "Animal_" + strSeq + ".xml";
outdoc.Save(Console.Out);
Console.WriteLine();
}
Console.WriteLine("END OF PROGRAM: Press <ENTER>");
Console.ReadLine();
}
我认为我有2个问题。
I think I have 2 problems.
A)将节点n上的ImportNode转换成outdoc之后,我调用outdoc.AppendChild(n),它抱怨:要插入的节点来自其他文档上下文。我不知道这是否是在ForEach循环中引用节点n的范围问题-或者我是否以某种方式未正确使用ImportNode()或AppendChild。 ImportNode()的第二个参数设置为true,因为我希望Animal的子元素(任意命名为Quantity,形容词和Name的三个字段)最终出现在目标文件中。
A) After doing the ImportNode on node n into outdoc, I call outdoc.AppendChild(n) which complains: "The node to be inserted is from a different document context." I do not know if this is a scope issue referencing node n within the ForEach loop - or if I am somehow not using ImportNode() or AppendChild properly. 2nd argument on ImportNode() is set to true, because I want the child elements of Animal (3 fields arbitrarily named Quantity, Adjective, and Name) to end up in the destination file.
B)第二个问题是将Animal元素添加到outdoc中。我会收到,但我需要,因此可以将节点n放入其中。我认为我的问题是我的工作方式:outdoc.AppendChild(rootnode);
B) Second problem is getting the Animal element into outdoc. I'm getting '' but I need ' ' so I can place node n inside it. I think my problem is how I am doing: outdoc.AppendChild(rootnode);
要显示xml,我在做:outdoc.Save(Console.Out) ;我确实有将save()保存到输出文件的代码-只要我能正确组装outdoc,它就可以工作。
To show the xml, I'm doing: outdoc.Save(Console.Out); I do have the code to save() to an output file - which does work, as long as I can get outdoc assembled properly.
在以下位置存在类似的问题: 将XML拆分为多个XML文件,但是我不理解解决方案代码然而。我想我已经很接近这种方法了,感谢您能提供的任何帮助。
There is a similar question at: Split XML in Multiple XML files, but I don't understand the solution code yet. I think I'm pretty close on this approach, and will appreciate any help you can provide.
我将使用XmlReader来完成相同的任务,因为我'将需要能够处理较大的输入文件,而且我了解XmlDocument会读取整个内容并可能导致内存问题。
I'm going to be doing this same task using XmlReader, since I'm going to need to be able to handle large input files, and I understand that XmlDocument reads the whole thing in and can cause memory issues.
推荐答案
这是一个简单的方法,似乎您正在寻找的东西
That's a simple method that seems what you are looking for
public void test_xml_split()
{
XmlDocument doc = new XmlDocument();
doc.Load("C:\\animals.xml");
XmlDocument newXmlDoc = null;
foreach (XmlNode animalNode in doc.SelectNodes("//Animals/Animal"))
{
newXmlDoc = new XmlDocument();
var targetNode = newXmlDoc.ImportNode(animalNode, true);
newXmlDoc.AppendChild(targetNode);
newXmlDoc.Save(Console.Out);
Console.WriteLine();
}
}
这篇关于拆分XML文档,从重复元素创建多个输出文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!