在C#中将大型XML读入不同节点类型的有效方法 [英] Efficient way to read large XML into dfferent node types in C#

查看:73
本文介绍了在C#中将大型XML读入不同节点类型的有效方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是C#的新手.我有一个相对较大的XML文件(28MB),并且正在尝试根据其内容将其子树解析为几种不同的类型.本质上,我有6900多个Content节点,必须对所有这些节点进行查询才能确定它们是什么类型.

I am new to C#. I have a relatively large XML file (28MB) and am trying to parse its subtrees into several different types based on their content. Essentially, I have 6900+ Content nodes that all have to be interrogated to figure out what type they are.

<Collections>
    <Content>..</Content>
    <Content>..</Content>
    <Content>..</Content>
    ...
</Collections>

对于每个Content节点,它下面的各种节点可以具有3种不同模式中的1种.我必须调查节点以确定我要查看的对象的模式/类型.

For each Content node, the variety of nodes below it can have 1 of 3 different patterns. I have to look into the node to decide which pattern/type of object I am looking at.

因此,假设一个Content节点中有大约100个子节点,第14个节点(在一种情况下)具有一个URL,并指出它是"type 1" ,并且应该包含字段1,2,3,... 17,28,47和58写入数据库.

So imagine a Content node that has about 100 subnodes in it, and the 14th node (in one case) has a URL in it and indicates it is a "type 1" and should have fields 1, 2, 3,...17, 28, 47 and 58 written to the DB.

另一种类型具有指示性的元素对(假设元素3和58),并指示它是类型2" ,并且应将不同的元素集写入数据库.

Another type has an indicative pair of elements (let's say element 3 and 58) and indicates it is a "type 2" and should have a different set of elements written to the DB.

依此类推...

从那里,我将对象映射到CMS/DB中,并将各种数据位连接到另一个系统中的字段,并将数据从相关元素写入数据库.

From there, I map the objects into a CMS/DB and connect various bits of data to fields in that other system and write data from the pertinent elements over to the DB.

由于源文件很大,所以我想有效地将​​子树从更大的文件中拉出,上下压缩(确定它们的类型),然后将重要数据(映射它们)写入数据库.

Since the source file is large, I would love to efficiently pull subtrees out of the larger file, zip up and down them (do decide on their types) and then wirte the important data (map them) over to the DB.

我是否必须以某种方式存储值,并在存储它们后决定这是什么类型的对象?

Do I have to store the values along the way somehow and decide after I have stored them, what sort of object this is?

我为XmlReader的前向方法和基于DOM的方法的简便性而苦苦挣扎.

I am struggling with the forward only approach of XmlReader and the ease of using a DOM based approach.

谢谢你的建议.

===编辑====谢谢评论者.Content节点内部的结构中将包含3个模式中的1个.每种类型中大约有100个节点,因此出于可读性的考虑,我不必理会它们.我确实尝试过并在上面澄清.

===edit==== Thank you commenters. The structure inside of the Content nodes would have 1 of 3 patterns in it. There are about 100 nodes in each type, so I did not bother pasting them in for readability's sake. I did try and clarify above though.

推荐答案

对于大文件,必须使用xmlreader.我更喜欢使用xmlreeader和xml linq的组合.尝试以下操作:

With large files you must use xmlreader. I prefer using combination of xmlreeader and xml linq. Try following :

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;

namespace ConsoleApplication1
{
    class Program
    {
        const string FILENAME = @"c:\temp\test.xml";
        static void Main(string[] args)
        {
            XmlReader reader = XmlReader.Create(FILENAME);
            while (!reader.EOF)
            {
                if (reader.Name != "Content")
                {
                    reader.ReadToFollowing("Content");
                }
                if (!reader.EOF)
                {
                    XElement content = (XElement)XElement.ReadFrom(reader);
                }
            }
        }
    }
}

这篇关于在C#中将大型XML读入不同节点类型的有效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆