在数据集上加载大型XML(OutOfMemory异常) [英] Loading large XML on DataSet (OutOfMemory Exception)

查看:40
本文介绍了在数据集上加载大型XML(OutOfMemory异常)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试通过URl读取3GB的XML文件,并将所有作业存储在数据集中.XML看起来像这样:

I am trying to read a 3GB XML file through a URl and store all the jobs in dataset. XML looks like this:

    <?xml version="1.0"?>
    <feed total="1621473">
      <job>
        <title><![CDATA[Certified Medical Assistant]]></title>
        <date>2016-03-25 14:19:38</date>
        <referencenumber>2089677765</referencenumber>
        <url><![CDATA[http://www.jobs2careers.com/click.php?id=2089677765.1347]]></url>
        <company><![CDATA[Broadway Medical Clinic]]></company>
        <city>Portland</city>
        <state>OR</state>
        <zip>97213</zip>
     </job>
     <job>
        <title><![CDATA[Certified Medical Assistant]]></title>
        <date>2016-03-25 14:19:38</date>
        <referencenumber>2089677765</referencenumber>
        <url><![CDATA[http://www.jobs2careers.com/click.php?id=2089677765.1347]]></url>
        <company><![CDATA[Broadway Medical Clinic]]></company>
        <city>Portland</city>
        <state>OR</state>
        <zip>97213</zip>
     </job>
    </feed>

这是我的代码

XmlDocument doc = new XmlDocument();
            doc.Load(url);
            DataSet ds = new DataSet();
            XmlNodeReader xmlReader = new XmlNodeReader(doc);

            while (xmlReader.ReadToFollowing("job"))
            {
                ds.ReadXml(xmlReader);
            }

但是我的内存超出了绑定异常.在谷歌浏览并找到了这个:

But I got memory out of bound exception. Browsed on google and found this:

DataSet ds = new DataSet();
        FileStream filestream = File.OpenRead(url);
        BufferedStream buffered = new BufferedStream(filestream);
        ds.ReadXml(buffered);

仍然是相同的例外.我还阅读了有关XmlTextReader的信息,但在我的情况下我不知道如何使用它.我知道为什么我会收到例外,但我不知道该如何克服.谢谢

still the same exception. I also read about XmlTextReader but i don't know how to make use of it in my case. I know why i am getting the exception but i don't know how to overcome that.Thanks

推荐答案

与其尝试将整个文件加载到DataSet或其他容器中,不如加载批次并将每个批次写入数据库,以便保存该批次的任何内容都可以.每次都被清除?

Instead of trying to load the entire file into the DataSet or other container, how about loading batches and write each batch to the database so whatever is holding the batch can be cleared each time?

如何:执行大型XML文档的流转换 https://msdn.microsoft.com/en-us/library/bb387013.aspx

How to: Perform Streaming Transform of Large XML Documents https://msdn.microsoft.com/en-us/library/bb387013.aspx

        List<XElement> jobs = new List<XElement>();
        using (XmlReader reader = XmlReader.Create(filePath))
        {
            XElement job;
            reader.MoveToContent();
            while (reader.Read())
            {
                if ((reader.NodeType == XmlNodeType.Element) && (reader.Name == "job"))
                {
                    job = XElement.ReadFrom(reader) as XElement;
                    jobs.Add(job);

                    if (jobs.Count >= 1000)
                    {
                        // TODO: write batch to database
                        jobs.Clear();
                    }
                }
            }

            if (jobs.Count > 0)
            {
                // TODO: write remainder to database
                jobs.Clear();
            }

        }

使用DataSet的替代方法.

Alternative using a DataSet.

        DataSet ds = new DataSet();
        using (XmlReader reader = XmlReader.Create(filePath))
        {
            reader.MoveToContent();
            while (reader.Read())
            {
                if ((reader.NodeType == XmlNodeType.Element) && (reader.Name == "job"))
                {
                    ds.ReadXml(reader);

                    DataTable dt = ds.Tables["job"];
                    if (dt.Rows.Count >= 1000)
                    {
                        // TODO: write batch to database
                        dt.Rows.Clear();
                    }
                }
            }

            if (ds.Tables["job"].Rows.Count > 0)
            {
                // TODO: write remainder to database
                ds.Tables["job"].Rows.Clear();
            }
        }

这篇关于在数据集上加载大型XML(OutOfMemory异常)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆