使用C#XmlSerializer为大对象集编写块,以避免内存不足 [英] Use C# XmlSerializer to write in chunks for large sets of objects to avoid Out of Memory

查看:68
本文介绍了使用C#XmlSerializer为大对象集编写块,以避免内存不足的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我喜欢XmlSerialize的工作方式,如此简单,优雅并且具有= p属性.但是,在序列化为xml文件之前建立所有对象的集合时,我遇到了内存不足的问题.

I like how XmlSerialize works, so simple and elegant and with attributes =p However, I am running into Out of Memory issue while building up a collection of all my objects prior to serializing to xml file.

我正在从SQL数据库填充一个对象,并打算使用XmlSerialize将对象写到XML.它适用于小的子集,但是如果我尝试从数据库中获取所有对象,则会遇到内存不足"异常.

I am populating an object from a SQL database and intend to write the object out to XML using XmlSerialize. It works great for small subsets but if I try to grab all the objects from the DB I reach an Out of Memory exception.

是否有XmlSerialize的功能可以让我从数据库中抓取100个对象的批处理,然后编写它们,抓取下一批100个对象并附加到xml?

Is there some ability of XmlSerialize that would allow me to grab batches of 100 objects out of the database, then write them, grab the next batch of 100 objects and append to the xml?

我希望我不必陷入XmlDocument之类的问题或需要更多手动编码工作的事情...

I am hoping I dont have to bust out into XmlDocument or something that requires more manual coding efforts...

推荐答案

XmlSerializer 可以在序列化时将可枚举的数据流进出.对于实现 IEnumerable< T> 的类,它具有特殊的处理方式.来自文档:

XmlSerializer can, in fact, stream enumerable data in and out when serializing. It has special handling for a class that implements IEnumerable<T>. From the docs:

XmlSerializer对实现IEnumerable或ICollection的类给予特殊待遇.实现IEnumerable的类必须实现采用单个参数的公共Add方法.Add方法的参数必须与从GetEnumerator返回的值上的Current属性所返回的类型相同,或该类型的基数之一.

The XmlSerializer gives special treatment to classes that implement IEnumerable or ICollection. A class that implements IEnumerable must implement a public Add method that takes a single parameter. The Add method's parameter must be of the same type as is returned from the Current property on the value returned from GetEnumerator, or one of that type's bases.

序列化此类时, XmlSerializer 会简单地迭代通过可枚举的方式将每个当前值写入输出流.它不会首先将整个可枚举加载到列表中.因此,如果您有一些Linq查询,可以动态地分批从数据库中以 T 类型的结果动态分页(示例

When serializing such classes, XmlSerializer simply iterates through the enumerable writing each current value to the output stream. It does not load the entire enumerable into a list first. Thus, if you have some Linq query that dynamically pages in results of type T from a database in chunks (example here), you can serialize all of them out without loading them all at once using the following wrapper:

// Proxy class for any enumerable with the requisite `Add` methods.
public class EnumerableProxy<T> : IEnumerable<T>
{
    [XmlIgnore]
    public IEnumerable<T> BaseEnumerable { get; set; }

    public void Add(T obj)
    {
        throw new NotImplementedException();
    }

    #region IEnumerable<T> Members

    public IEnumerator<T> GetEnumerator()
    {
        if (BaseEnumerable == null)
            return Enumerable.Empty<T>().GetEnumerator();
        return BaseEnumerable.GetEnumerator();
    }

    #endregion

    #region IEnumerable Members

    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }

    #endregion
}

请注意,此类仅用于序列化,而不用于反序列化.这是一个使用方法的示例:

Note this class is only useful for serializing, not deserializing. Here is an example of how to use it:

public class RootObject<T>
{
    [XmlIgnore]
    public IEnumerable<T> Results { get; set; }

    [XmlArray("Results")]
    public EnumerableProxy<T> ResultsProxy { 
        get
        {
            return new EnumerableProxy<T> { BaseEnumerable = Results };
        }
        set
        {
            throw new NotImplementedException();
        }
    }
}

public class TestClass
{
    XmlWriter xmlWriter;
    TextWriter textWriter;

    public void Test()
    {
        try
        {
            var root = new RootObject<int>();
            root.Results = GetResults();

            using (textWriter = new StringWriter())
            {
                var settings = new XmlWriterSettings { Indent = true, IndentChars = "  " };
                using (xmlWriter = XmlWriter.Create(textWriter, settings))
                {
                    (new XmlSerializer(root.GetType())).Serialize(xmlWriter, root);
                }
                var xml = textWriter.ToString();
                Debug.WriteLine(xml);
            }
        }
        finally
        {
            xmlWriter = null;
            textWriter = null;
        }
    }

    IEnumerable<int> GetResults()
    {
        foreach (var i in Enumerable.Range(0, 1000))
        {
            if (i > 0 && (i % 500) == 0)
            {
                HalfwayPoint();
            }
            yield return i;
        }
    }

    private void HalfwayPoint()
    {
        if (xmlWriter != null)
        {
            xmlWriter.Flush();
            var xml = textWriter.ToString();
            Debug.WriteLine(xml);
        }
    }
}

如果您在 HalfwayPoint()中设置了一个中断,您将看到一半的XML已经被写出,同时仍然遍历可枚举对象.(当然,我只是出于测试目的而写一个字符串,而您可能正在写一个文件.)

If you set a break in HalfwayPoint(), you will see that half the XML has already been written out while still iterating through the enumerable. (Of course, I'm just writing to a string for test purposes while you would probably be writing to a file.)

这篇关于使用C#XmlSerializer为大对象集编写块,以避免内存不足的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆