使用XmlReader将大型XML文件解析为多个输出xml-获取其他所有元素 [英] Parsing a large XML file to multiple output xmls, using XmlReader - getting every other element

查看：98 发布时间：2020/4/29 3:32:48 c# xml split large-files xmlreader

本文介绍了使用XmlReader将大型XML文件解析为多个输出xml-获取其他所有元素的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要获取一个非常大的XML文件，并从可能是输入文件中成千上万个重复节点的位置创建多个输出xml文件.看起来像这样的源文件"AnimalBatch.xml"中没有空格:

I need to take a very large XML file and create multiple output xml files from what could be thousands of repeating nodes of the input file. There is no whitespace in the source file "AnimalBatch.xml" which looks like this:

<?xml version="1.0" encoding="utf-8" ?><Animals><Animal id="1001"><Quantity>One</Quantity><Adjective>Red</Adjective><Name>Rooster</Name></Animal><Animal id="1002"><Quantity>Two</Quantity><Adjective>Stubborn</Adjective><Name>Donkeys</Name></Animal><Animal id="1003"><Quantity>Three</Quantity><Adjective>Blind</Adjective><Name>Mice</Name></Animal><Animal id="1004"><Quantity>Four</Quantity><Adjective>Purple</Adjective><Name>Horses</Name></Animal><Animal id="1005"><Quantity>Five</Quantity><Adjective>Long</Adjective><Name>Centipedes</Name></Animal><Animal id="1006"><Quantity>Six</Quantity><Adjective>Dark</Adjective><Name>Owls</Name></Animal></Animals>

程序需要拆分重复的动物"，并产生适当数量的文件，命名为:Animal_1001.xml，Animal_1002.xml，Animal_1003.xml等.

The program needs to split the repeating "Animal" and produce the appropriate number of files named: Animal_1001.xml, Animal_1002.xml, Animal_1003.xml, etc.

Animal_1001.xml:
<?xml version="1.0" encoding="utf-8"?>
<Animal>
<Quantity>One</Quantity>
<Adjective>Red</Adjective>
<Name>Rooster</Name>
</Animal>

Animal_1002.xml
<?xml version="1.0" encoding="utf-8"?>
<Animal>
<Quantity>Two</Quantity>
<Adjective>Stubborn</Adjective>
<Name>Donkeys</Name>
</Animal>

Animal_1003.xml>
<?xml version="1.0" encoding="utf-8"?>
<Animal>
<Quantity>Three</Quantity>
<Adjective>Blind</Adjective>
<Name>Mice</Name>
</Animal>

下面的代码有效，但仅当输入文件的<Animal id="xxxx">元素后具有CR/LF时才有效.如果它没有空白"(我没有，也不能那样得到)，我就得到另一个(奇数动物)

The code below works, but only if the input file has CR/LF after the <Animal id="xxxx"> elements. If it has no "whitespace" (I don't, and can't get it like that), I get every other one (the odd numbered animals)

    static void SplitXMLReader()
    {
        string strFileName;
        string strSeq = "";

        XmlReader doc = XmlReader.Create("C:\\AnimalBatch.xml");

        while (doc.Read())
        {
            if ( doc.Name == "Animal"  && doc.NodeType == XmlNodeType.Element )
            {
                strSeq = doc.GetAttribute("id"); 

                XmlDocument outdoc = new XmlDocument();
                XmlDeclaration xmlDeclaration = outdoc.CreateXmlDeclaration("1.0", "utf-8", null);                     
                XmlElement rootNode = outdoc.CreateElement(doc.Name);

                rootNode.InnerXml = doc.ReadInnerXml();  
                // This seems to be advancing the cursor in doc too far.

                outdoc.InsertBefore(xmlDeclaration, outdoc.DocumentElement);
                outdoc.AppendChild(rootNode);

                strFileName = "Animal_" + strSeq + ".xml";
                outdoc.Save("C:\\" + strFileName);                    
            }
        }
    }

我的理解是XML中的空白"或格式应该与XmlReader没有区别-但是我已经尝试过这两种方式，在<Animal id="xxxx">之后加上或不加上CR/LF，并且可以确认是否存在差异.如果它具有CR/LF(甚至可能只有一个空格，我将在后面尝试)-它会完全处理每个<Animal>节点，并保存在id属性提供的正确文件名下.

My understanding is that "whitespace" or formatting in XML should make no difference to XmlReader - but I've tried this both ways, with and without CR/LF's after the <Animal id="xxxx">, and can confirm there is a difference. If it has CR/LFs (possibly even just a space, which I'll try next) - it gets each <Animal> node processed fully, and saved under the right filename that comes from the id attribute.

有人可以让我知道这是怎么回事-以及可能的解决方法吗?

Can someone let me know what's going on here - and a possible workaround?

使用XmlReader将大型XML文件解析为多个输出xml-获取其他所有元素 [英] Parsing a large XML file to multiple output xmls, using XmlReader - getting every other element

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

使用XmlReader将大型XML文件解析为多个输出xml-获取其他所有元素 [英] Parsing a large XML file to multiple output xmls, using XmlReader - getting every other element

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭