拆分XML文档,从重复元素创建多个输出文件 [英] Split XML document apart creating multiple output files from repeating elements

查看:127
本文介绍了拆分XML文档,从重复元素创建多个输出文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要获取一个XML文件,并从输入文件的重复节点创建多个输出xml文件。源文件 AnimalBatch.xml如下所示:

I need to take an XML file and create multiple output xml files from the repeating nodes of the input file. The source file "AnimalBatch.xml" looks like this:

<?xml version = 1.0 encoding = utf-8吗? >

<动物>

<动物id = 1001>

< Quantity>一个< / Quantity>

<形容词>红色< /形容词>

<名称>公鸡< /名称>

< / Animal>

< Animal id = 1002>

< Quantity> Two< / Quantity>

<形容词>顽固性< /形容词>

<名称>驴< /名称>

< / Animal>

<动物id = 1003 >

< Quantity>三< / Quantity>

< Color>盲< / Color>

< Name>小鼠< / Name>

< / Animal>

< / Animals>

<?xml version="1.0" encoding="utf-8" ?>
<Animals>
<Animal id="1001">
<Quantity>One</Quantity>
<Adjective>Red</Adjective>
<Name>Rooster</Name>
</Animal>
<Animal id="1002">
<Quantity>Two</Quantity>
<Adjective>Stubborn</Adjective>
<Name>Donkeys</Name>
</Animal>
<Animal id="1003">
<Quantity>Three</Quantity>
<Color>Blind</Color>
<Name>Mice</Name>
</Animal>
</Animals>

该程序需要拆分重复的动物并产生3个文件,分别是Animal_1001.xml,Animal_1002.xml和Animal_1003.xml

The program needs to split the repeating "Animal" and produce 3 files named: Animal_1001.xml, Animal_1002.xml, and Animal_1003.xml

每个输出文件应仅包含其各自的元素(将作为根)。 AnimalsBatch.xml的id属性将提供Animal_xxxx.xml文件名的序列号。

Each output file should contain just their respective element (which will be the root). The id attribute from AnimalsBatch.xml will supply the sequence number for the Animal_xxxx.xml filenames. The id attribute does not need to be in the output files.

Animal_1001.xml:

<?xml version = 1.0 encoding = utf-8?>

<动物> ;

< Quantity>一个< / Quantity>

<形容词>红色//形容词>

<名称"公鸡< /名称>

< / Animal>

Animal_1001.xml:
<?xml version="1.0" encoding="utf-8"?>
<Animal>
<Quantity>One</Quantity>
<Adjective>Red</Adjective>
<Name>Rooster</Name>
</Animal>

Animal_1002.xml

<?xml version = 1.0 encoding = utf-8?>

<动物>

< Quantity> Two< / Quantity>

<形容词>顽固的< /形容词>

<名称>驴< /名称> ;

< / Animal>

Animal_1002.xml
<?xml version="1.0" encoding="utf-8"?>
<Animal>
<Quantity>Two</Quantity>
<Adjective>Stubborn</Adjective>
<Name>Donkeys</Name>
</Animal>

Animal_1003.xml>

<?xml version = 1.0 encoding = utf-8?>

<动物>

< Quantity> T hree< / Quantity>

<形容词&Blind< /形容词>

<名称>小鼠< /名称>

< /动物>

Animal_1003.xml>
<?xml version="1.0" encoding="utf-8"?>
<Animal>
<Quantity>Three</Quantity>
<Adjective>Blind</Adjective>
<Name>Mice</Name>
</Animal>

我想用XmlDocument做到这一点,因为它需要能够在.Net 2.0上运行。

I want to do this with XmlDocument, since it needs to be able to run on .Net 2.0.

我的程序如下所示:

    static void Main(string[] args)
    {
        string strFileName;    
        string strSeq;                    

        XmlDocument doc = new XmlDocument(); 
        doc.Load("D:\\Rick\\Computer\\XML\\AnimalBatch.xml");

        XmlNodeList nl = doc.DocumentElement.SelectNodes("Animal");

        foreach (XmlNode n in nl)
        {
            strSeq = n.Attributes["id"].Value;

            XmlDocument outdoc = new XmlDocument();
            XmlNode rootnode = outdoc.CreateNode("element", "Animal", "");

            outdoc.AppendChild(rootnode); // Put the wrapper element into outdoc

            outdoc.ImportNode(n, true);   // place the node n into outdoc
            outdoc.AppendChild(n);        // This statement errors:
            // "The node to be inserted is from a different document context."

            strFileName = "Animal_" + strSeq + ".xml";

            outdoc.Save(Console.Out);
            Console.WriteLine();
        }
        Console.WriteLine("END OF PROGRAM:  Press <ENTER>");
        Console.ReadLine();
    }

我认为我有2个问题。

I think I have 2 problems.

A)将节点n上的ImportNode转换成outdoc之后,我调用outdoc.AppendChild(n),它抱怨:要插入的节点来自其他文档上下文。我不知道这是否是在ForEach循环中引用节点n的范围问题-或者我是否以某种方式未正确使用ImportNode()或AppendChild。 ImportNode()的第二个参数设置为true,因为我希望Animal的子元素(任意命名为Quantity,形容词和Name的三个字段)最终出现在目标文件中。

A) After doing the ImportNode on node n into outdoc, I call outdoc.AppendChild(n) which complains: "The node to be inserted is from a different document context." I do not know if this is a scope issue referencing node n within the ForEach loop - or if I am somehow not using ImportNode() or AppendChild properly. 2nd argument on ImportNode() is set to true, because I want the child elements of Animal (3 fields arbitrarily named Quantity, Adjective, and Name) to end up in the destination file.

B)第二个问题是将Animal元素添加到outdoc中。我会收到,但我需要,因此可以将节点n放入其中。我认为我的问题是我的工作方式:outdoc.AppendChild(rootnode);

B) Second problem is getting the Animal element into outdoc. I'm getting '' but I need ' ' so I can place node n inside it. I think my problem is how I am doing: outdoc.AppendChild(rootnode);

要显示xml,我在做:outdoc.Save(Console.Out) ;我确实有将save()保存到输出文件的代码-只要我能正确组装outdoc,它就可以工作。

To show the xml, I'm doing: outdoc.Save(Console.Out); I do have the code to save() to an output file - which does work, as long as I can get outdoc assembled properly.

在以下位置存在类似的问题: 将XML拆分为多个XML文件,但是我不理解解决方案代码然而。我想我已经很接近这种方法了,感谢您能提供的任何帮助。

There is a similar question at: Split XML in Multiple XML files, but I don't understand the solution code yet. I think I'm pretty close on this approach, and will appreciate any help you can provide.

我将使用XmlReader来完成相同的任务,因为我'将需要能够处理较大的输入文件,而且我了解XmlDocument会读取整个内容并可能导致内存问题。

I'm going to be doing this same task using XmlReader, since I'm going to need to be able to handle large input files, and I understand that XmlDocument reads the whole thing in and can cause memory issues.

推荐答案

这是一个简单的方法,似乎您正在寻找的东西

That's a simple method that seems what you are looking for

public void test_xml_split()
{
    XmlDocument doc = new XmlDocument();
    doc.Load("C:\\animals.xml");
    XmlDocument newXmlDoc = null;

    foreach (XmlNode animalNode in doc.SelectNodes("//Animals/Animal"))
    {
        newXmlDoc = new XmlDocument();
        var targetNode = newXmlDoc.ImportNode(animalNode, true);
        newXmlDoc.AppendChild(targetNode);
        newXmlDoc.Save(Console.Out);
        Console.WriteLine();
    }
}

这篇关于拆分XML文档,从重复元素创建多个输出文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆