ReadOuterXml 正在抛出 OutOfMemoryException 读取大(1 GB)XML 文件的一部分 [英] ReadOuterXml is throwing OutOfMemoryException reading part of large (1 GB) XML file

查看:30
本文介绍了ReadOuterXml 正在抛出 OutOfMemoryException 读取大(1 GB)XML 文件的一部分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理一个大型 XML 文件,在运行应用程序时,XmlTextReader.ReadOuterXml() 方法抛出内存异常.

代码行就像,

XmlTextReader xr = null;尝试{xr = 新 XmlTextReader(fileName);while (xr.Read() && 成功){if (xr.NodeType != XmlNodeType.Element)继续;开关(xr.Name){案例A":var xml = xr.ReadOuterXml();var n = GetDetails(xml);休息;}}}捕获(异常前){//做东西}

使用:

private int GetDetails (string xml){var rootNode = XDocument.Parse(xml);var xnodes = rootNode.XPathSelectElements("//A/B").ToList();//然后处理节点列表}

现在在加载 XML 文件时,应用程序在 xr.ReadOuterXml() 行抛出异常.可以做些什么来避免这种情况?XML 的大小接近 1 GB.

解决方案

您在 ReadOuterXml() 中收到 OutOfMemoryException 的最可能原因是您试图将 1 GB XML 文档的大部分读入一个字符串,并达到 .Net 中的最大字符串长度.>

所以,不要那样做.而是使用 XDocument.Load()XmlReader.ReadSubtree():

using (var xr = XmlReader.Create(fileName)){while (xr.Read() && 成功){if (xr.NodeType != XmlNodeType.Element)继续;开关(xr.Name){案例A":{//ReadSubtree() 将读取器定位在读取元素的 EndElement 处,因此//下一次调用 Read() 移动到下一个节点.使用 (var subReader = xr.ReadSubtree()){var doc = XDocument.Load(subReader);获取详细信息(文档);}}休息;}}}

然后在 GetDetails() 中执行:

private int GetDetails(XDocument rootDocument){var xnodes = rootDocument.XPathSelectElements("//A/B").ToList();//然后处理节点列表返回 xnodes.Count;}

这不仅会占用更少的内存,而且性能也会更高.ReadOuterXml() 使用临时 XmlWriter 将输入流中的 XML 复制到输出 StringWriter(然后您再次解析).这个版本的算法完全跳过了这个额外的工作.它还避免创建足够大的字符串以进入大型对象堆,这可能会导致额外的性能问题.

如果这仍然使用太多内存,您将需要实施 SAX-like 解析您的 XML,您一次只加载一个元素 <B>.首先介绍如下扩展方法:

public static 部分类 XmlReaderExtensions{公共静态 IEnumerableWalkXmlElements(这个XmlReader xmlReader, Predicate<Stack<XName>> filter){堆栈名称 = 新堆栈();而 (xmlReader.Read()){如果(xmlReader.NodeType == XmlNodeType.Element){names.Push(XName.Get(xmlReader.LocalName, xmlReader.NamespaceURI));如果(过滤器(名称)){使用 (var subReader = xmlReader.ReadSubtree()){yield return XElement.Load(subReader);}}}if ((xmlReader.NodeType == XmlNodeType.Element && xmlReader.IsEmptyElement)||xmlReader.NodeType == XmlNodeType.EndElement){名字.流行();}}}}

然后,使用如下:

using (var xr = XmlReader.Create(fileName)){谓词<堆栈<XName>>过滤器 =(堆栈)=>stack.Peek().LocalName == "B" &&stack.Count >1 &&stack.ElementAt(1).LocalName == "A";foreach(xr.WalkXmlElements(过滤器)中的var元素){//然后在特定节点上工作.}}

I am working on a large XML file and while running the application, XmlTextReader.ReadOuterXml() method is throwing memory exception.

Lines of codes are like,

XmlTextReader xr = null;
try
{
    xr = new XmlTextReader(fileName);
    while (xr.Read() && success)
    {
        if (xr.NodeType != XmlNodeType.Element) 
            continue;
        switch (xr.Name)
        {
            case "A":
                var xml = xr.ReadOuterXml();
                var n = GetDetails(xml);
                break;
        }
    }
}
catch (Exception ex)
{
    //Do stuff
}

Using:

private int GetDetails (string xml)
{

    var rootNode = XDocument.Parse(xml);
    var xnodes = rootNode.XPathSelectElements("//A/B").ToList();
    //Then  working on list of nodes

}

Now while loading the XML files, the application throwing exception on the xr.ReadOuterXml() line. What can be done to avoid this? The size of XML is almost 1 GB.

解决方案

The most likely reason you are getting a OutOfMemoryException in ReadOuterXml() is that you are trying to read in a substantial portion of the 1 GB XML document into a string, and are hitting the Maximum string length in .Net.

So, don't do that. Instead load directly from the XmlReader using XDocument.Load() with XmlReader.ReadSubtree():

using (var xr = XmlReader.Create(fileName))
{
    while (xr.Read() && success)
    {
        if (xr.NodeType != XmlNodeType.Element)
            continue;
        switch (xr.Name)
        {
            case "A":
                {
                    // ReadSubtree() positions the reader at the EndElement of the element read, so the 
                    // next call to Read() moves to the next node.
                    using (var subReader = xr.ReadSubtree())
                    {
                        var doc = XDocument.Load(subReader);
                        GetDetails(doc);
                    }
                }
                break;
        }
    }
}

And then in GetDetails() do:

private int GetDetails(XDocument rootDocument)
{
    var xnodes = rootDocument.XPathSelectElements("//A/B").ToList();
    //Then  working on list of nodes
    return xnodes.Count;
}

Not only will this use less memory, it will also be more performant. ReadOuterXml() uses a temporary XmlWriter to copy the XML in the input stream to an output StringWriter (which you then parse a second time). This version of the algorithm completely skips this extra work. It also avoids creating strings large enough to go on the large object heap which can cause additional performance issues.

If this is still using too much memory you will need to implement SAX-like parsing for your XML where you only load one element <B> at a time. First, introduce the following extension method:

public static partial class XmlReaderExtensions
{
    public static IEnumerable<XElement> WalkXmlElements(this XmlReader xmlReader, Predicate<Stack<XName>> filter)
    {
        Stack<XName> names = new Stack<XName>();

        while (xmlReader.Read())
        {
            if (xmlReader.NodeType == XmlNodeType.Element)
            {
                names.Push(XName.Get(xmlReader.LocalName, xmlReader.NamespaceURI));
                if (filter(names))
                {
                    using (var subReader = xmlReader.ReadSubtree())
                    {
                        yield return XElement.Load(subReader);
                    }
                }
            }

            if ((xmlReader.NodeType == XmlNodeType.Element && xmlReader.IsEmptyElement)
                || xmlReader.NodeType == XmlNodeType.EndElement)
            {
                names.Pop();
            }
        }
    }
}

Then, use it as follows:

using (var xr = XmlReader.Create(fileName))
{
    Predicate<Stack<XName>> filter =
        (stack) => stack.Peek().LocalName == "B" && stack.Count > 1 && stack.ElementAt(1).LocalName == "A";
    foreach (var element in xr.WalkXmlElements(filter))
    {
        //Then working on the specific node.
    }
}

这篇关于ReadOuterXml 正在抛出 OutOfMemoryException 读取大(1 GB)XML 文件的一部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆