C#-OutOfMemoryException将列表保存在JSON文件中 [英] C# - OutOfMemoryException saving a List on a JSON file

查看:174
本文介绍了C#-OutOfMemoryException将列表保存在JSON文件中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试保存压力图的流数据. 基本上,我的压力矩阵定义为:

I'm trying to save the streaming data of a pressure map. Basically I have a pressure matrix defined as:

double[,] pressureMatrix = new double[e.Data.GetLength(0), e.Data.GetLength(1)];

基本上,我每10毫秒就会收到一个pressureMatrix之一,我想将所有信息保存在JSON文件中,以便以后重现.

Basically, I'm getting one of this pressureMatrix every 10 milliseconds and I want to save all the information in a JSON file to be able to reproduce it later.

首先,我要做的就是写带有所有用于录制的设置的标题:

What I do is, first of all, write what I call the header with all the settings used to do the recording like this:

recordedData.softwareVersion = Assembly.GetExecutingAssembly().GetName().Version.Major.ToString() + "." + Assembly.GetExecutingAssembly().GetName().Version.Minor.ToString();
recordedData.calibrationConfiguration = calibrationConfiguration;
recordedData.representationConfiguration = representationSettings;
recordedData.pressureData = new List<PressureMap>();

var json = JsonConvert.SerializeObject(csvRecordedData, Formatting.None);

File.WriteAllText(this.filePath, json);

然后,每次获得新的压力图时,我都会创建一个新的线程以添加新的PressureMatrix并重新写入文件:

Then, every time I get a new pressure map I create a new Thread to add the new PressureMatrix and re-write the file:

var newPressureMatrix = new PressureMap(datos, DateTime.Now);
recordedData.pressureData.Add(newPressureMatrix);
var json = JsonConvert.SerializeObject(recordedData, Formatting.None);
File.WriteAllText(this.filePath, json);

大约20-30分钟后,我收到OutOfMemory异常,因为系统无法保存 recordedData 变量,因为其中的List<PressureMatrix>太大.

After about 20-30 min I get an OutOfMemory Exception because the system cannot hold the recordedData var because the List<PressureMatrix> in it is too big.

我该如何处理以保存数据?我想保存24-48小时的信息.

How can I handle this to save a the data? I would like to save the information of 24-48 hours.

推荐答案

您的基本问题是,您将所有压力图样本保存在内存中,而不是分别编写每个样本,然后将其垃圾回收.更糟糕的是,您在两个不同的地方执行此操作:

Your basic problem is that you are holding all of your pressure map samples in memory rather than writing each one individually and then allowing it to be garbage collected. What's worse, you are doing this in two different places:

  1. 您需要先将整个示例列表序列化为JSON字符串json,然后再将字符串写入文件.

  1. You serialize your entire list of samples to a JSON string json before writing the string to a file.

相反,如 性能提示:优化内存使用情况 ,在这种情况下,您应该直接在文件中进行序列化和反序列化.有关如何执行此操作的说明,请参见 ="a href =" https://stackoverflow.com的此答案. /q/8157636/3744182">Json.NET可以序列化/反序列化到/从流中反序列化吗? 以及

Instead, as explained in Performance Tips: Optimize Memory Usage, you should serialize and deserialize directly to and from your file in such situations. For instructions on how to do this see this answer to Can Json.NET serialize / deserialize to / from a stream? and also Serialize JSON to a file.

recordedData.pressureData = new List<PressureMap>();会累加所有压力图样本,然后在每次创建样本时都写出所有这些.

The recordedData.pressureData = new List<PressureMap>(); accumulates all pressure map samples, then writes all of them every time a sample is made.

一个更好的解决方案是一次编写每个样本,然后将其忘记,但是每个样本都必须嵌套在JSON中的某些容器对象中,这使得如何做到这一点变得不明显.

A better solution would be to write each sample once and forget it, but the requirement for each sample to be nested inside some container objects in the JSON makes it nonobvious how to do that.

那么,如何应对问题2?

So, how to attack issue #2?

首先,让我们如下修改数据模型,将标头数据划分为一个单独的类:

First, let's modify your data model as follows, partitioning the header data into a separate class:

public class PressureMap
{
    public double[,] PressureMatrix { get; set; }
}

public class CalibrationConfiguration 
{
    // Data model not included in question
}

public class RepresentationConfiguration 
{
    // Data model not included in question
}

public class RecordedDataHeader
{
    public string SoftwareVersion { get; set; }
    public CalibrationConfiguration CalibrationConfiguration { get; set; }
    public RepresentationConfiguration RepresentationConfiguration { get; set; }
}

public class RecordedData
{
    // Ensure the header is serialized first.
    [JsonProperty(Order = 1)]
    public RecordedDataHeader RecordedDataHeader { get; set; }
    // Ensure the pressure data is serialized last.
    [JsonProperty(Order = 2)]
    public IEnumerable<PressureMap> PressureData { get; set; }
}

选项#1

Option #1 is a version of the producer-comsumer pattern. It involves spinning up two threads: one to generate PressureData samples, and one to serialize the RecordedData. The first thread will generate samples and add them to a BlockingCollection<PressureMap> collection that is passed to the second thread. The second thread will then serialize BlockingCollection<PressureMap>.GetConsumingEnumerable() as the value of RecordedData.PressureData.

以下代码为执行此操作提供了框架:

The following code gives a skeleton for how to do this:

var sampleCount = 400;    // Or whatever stopping criterion you prefer
var sampleInterval = 10;  // in ms

using (var pressureData = new BlockingCollection<PressureMap>())
{
    // Adapted from
    // https://docs.microsoft.com/en-us/dotnet/standard/collections/thread-safe/blockingcollection-overview
    // https://docs.microsoft.com/en-us/dotnet/api/system.collections.concurrent.blockingcollection-1?view=netframework-4.7.2

    // Spin up a Task to sample the pressure maps
    using (Task t1 = Task.Factory.StartNew(() =>
    {
        for (int i = 0; i < sampleCount; i++)
        {
            var data = GetPressureMap(i);
            Console.WriteLine("Generated sample {0}", i);
            pressureData.Add(data);
            System.Threading.Thread.Sleep(sampleInterval);
        }
        pressureData.CompleteAdding();
    }))
    {
        // Spin up a Task to consume the BlockingCollection
        using (Task t2 = Task.Factory.StartNew(() =>
        {
            var recordedDataHeader = new RecordedDataHeader
            {
                SoftwareVersion = softwareVersion,
                CalibrationConfiguration = calibrationConfiguration,
                RepresentationConfiguration = representationConfiguration,
            };

            var settings = new JsonSerializerSettings
            {
                ContractResolver = new CamelCasePropertyNamesContractResolver(),
            };

            using (var stream = new FileStream(this.filePath, FileMode.Create))
            using (var textWriter = new StreamWriter(stream))
            using (var jsonWriter = new JsonTextWriter(textWriter))
            {
                int j = 0;

                var query = pressureData
                    .GetConsumingEnumerable()
                    .Select(p => 
                            { 
                                // Flush the writer periodically in case the process terminates abnormally
                                jsonWriter.Flush();
                                Console.WriteLine("Serializing item {0}", j++);
                                return p;
                            });

                var recordedData = new RecordedData
                {
                    RecordedDataHeader = recordedDataHeader,
                    // Since PressureData is declared as IEnumerable<PressureMap>, evaluation will be lazy.
                    PressureData = query,
                };                          

                Console.WriteLine("Beginning serialization of {0} to {1}:", recordedData, this.filePath);
                JsonSerializer.CreateDefault(settings).Serialize(textWriter, recordedData);
                Console.WriteLine("Finished serialization of {0} to {1}.", recordedData, this.filePath);
            }
        }))
        {
            Task.WaitAll(t1, t2);
        }
    }
}

注意:

  • 此解决方案使用以下事实:序列化IEnumerable<T>时,Json.NET将将可枚举实现为列表.取而代之的是,它将充分利用惰性评估,并简单地枚举它,然后编写,然后忘记遇到的每个项目.

  • This solution uses the fact that, when serializing an IEnumerable<T>, Json.NET will not materialize the enumerable as a list. Instead it will take full advantage of lazy evaluation and simply enumerate through it, writing then forgetting each individual item encountered.

第一个线程对PressureData进行采样,并将其添加到阻塞集合中.

The first thread samples PressureData and adds them to the blocking collection.

第二个线程将阻塞集合包装在IEnumerable<PressureData>中,然后将其序列化为RecordedData.PressureData.

The second thread wraps the blocking collection in an IEnumerable<PressureData> then serializes that as RecordedData.PressureData.

在序列化期间,序列化程序将通过IEnumerable<PressureData>枚举枚举,将每个流传输到JSON文件,然后继续进行下一个-有效地阻塞直到一个可用.

During serialization, the serializer will enumerate through the IEnumerable<PressureData> enumerable, streaming each to the JSON file then proceeding to the next -- effectively blocking until one becomes available.

您将需要进行一些实验,以确保可以通过设置

You will need to do some experimentation to make sure that the serialization thread can "keep up" with the sampling thread, possibly by setting a BoundedCapacity during construction. If not, you may need to adopt a different strategy.

PressureMap GetPressureMap(int count)应该是您的某种方法(问题中未显示),该方法可以返回当前压力图样本.

PressureMap GetPressureMap(int count) should be some method of yours (not shown in the question) that returns the current pressure map sample.

在此技术中,JSON文件在采样会话期间保持打开状态.如果采样异常终止,则文件可能会被截断.我尝试通过定期刷新编写器来缓解此问题.

In this technique the JSON file remains open for the duration of the sampling session. If sampling terminates abnormally the file may be truncated. I make some attempt to ameliorate the problem by flushing the writer periodically.

虽然数据序列化将不再需要无限制的内存量,但稍后反序列化RecordedData会将序列化PressureData数组反序列化为具体的List<PressureMap>.这可能在下游处理期间导致内存问题.

While data serialization will no longer require unbounded amounts of memory, deserializing a RecordedData later will deserialize the PressureData array into a concrete List<PressureMap>. This may possibly cause memory issues during downstream processing.

演示小提琴#1 此处.

选项#2 是从JSON文件切换到换行符分隔JSON 文件.这样的文件由用换行符分隔的JSON对象序列组成.在您的情况下,您将使第一个对象包含RecordedDataHeader信息,而随后的对象的类型为PressureMap:

Option #2 would be to switch from a JSON file to a Newline Delimited JSON file. Such a file consists of sequences of JSON objects separated by newline characters. In your case, you would make the first object contain the RecordedDataHeader information, and the subsequent objects be of type PressureMap:

var sampleCount = 100; // Or whatever
var sampleInterval = 10;

var recordedDataHeader = new RecordedDataHeader
{
    SoftwareVersion = softwareVersion,
    CalibrationConfiguration = calibrationConfiguration,
    RepresentationConfiguration = representationConfiguration,
};

var settings = new JsonSerializerSettings
{
    ContractResolver = new CamelCasePropertyNamesContractResolver(),
};

// Write the header
Console.WriteLine("Beginning serialization of sample data to {0}.", this.filePath);

using (var stream = new FileStream(this.filePath, FileMode.Create))
{
    JsonExtensions.ToNewlineDelimitedJson(stream, new[] { recordedDataHeader });
}

// Write each sample incrementally

for (int i = 0; i < sampleCount; i++)
{
    Thread.Sleep(sampleInterval);
    Console.WriteLine("Performing sample {0} of {1}", i, sampleCount);
    var map = GetPressureMap(i);

    using (var stream = new FileStream(this.filePath, FileMode.Append))
    {
        JsonExtensions.ToNewlineDelimitedJson(stream, new[] { map });
    }
}

Console.WriteLine("Finished serialization of sample data to {0}.", this.filePath);

使用扩展方法:

public static partial class JsonExtensions
{
    // Adapted from the answer to
    // https://stackoverflow.com/questions/44787652/serialize-as-ndjson-using-json-net
    // by dbc https://stackoverflow.com/users/3744182/dbc
    public static void ToNewlineDelimitedJson<T>(Stream stream, IEnumerable<T> items)
    {
        // Let caller dispose the underlying stream 
        using (var textWriter = new StreamWriter(stream, new UTF8Encoding(false, true), 1024, true))
        {
            ToNewlineDelimitedJson(textWriter, items);
        }
    }

    public static void ToNewlineDelimitedJson<T>(TextWriter textWriter, IEnumerable<T> items)
    {
        var serializer = JsonSerializer.CreateDefault();

        foreach (var item in items)
        {
            // Formatting.None is the default; I set it here for clarity.
            using (var writer = new JsonTextWriter(textWriter) { Formatting = Formatting.None, CloseOutput = false })
            {
                serializer.Serialize(writer, item);
            }
            // http://specs.okfnlabs.org/ndjson/
            // Each JSON text MUST conform to the [RFC7159] standard and MUST be written to the stream followed by the newline character \n (0x0A). 
            // The newline charater MAY be preceeded by a carriage return \r (0x0D). The JSON texts MUST NOT contain newlines or carriage returns.
            textWriter.Write("\n");
        }
    }

    // Adapted from the answer to 
    // https://stackoverflow.com/questions/29729063/line-delimited-json-serializing-and-de-serializing
    // by Yuval Itzchakov https://stackoverflow.com/users/1870803/yuval-itzchakov
    public static IEnumerable<TBase> FromNewlineDelimitedJson<TBase, THeader, TRow>(TextReader reader)
        where THeader : TBase
        where TRow : TBase
    {
        bool first = true;

        using (var jsonReader = new JsonTextReader(reader) { CloseInput = false, SupportMultipleContent = true })
        {
            var serializer = JsonSerializer.CreateDefault();

            while (jsonReader.Read())
            {
                if (jsonReader.TokenType == JsonToken.Comment)
                    continue;
                if (first)
                {
                    yield return serializer.Deserialize<THeader>(jsonReader);
                    first = false;
                }
                else
                {
                    yield return serializer.Deserialize<TRow>(jsonReader);
                }
            }
        }
    }
}

稍后,您可以按以下方式处理换行符分隔的JSON文件:

Later, you can process the newline delimited JSON file as follows:

using (var stream = File.OpenRead(filePath))
using (var textReader = new StreamReader(stream))
{
    foreach (var obj in JsonExtensions.FromNewlineDelimitedJson<object, RecordedDataHeader, PressureMap>(textReader))
    {
        if (obj is RecordedDataHeader)
        {
            var header = (RecordedDataHeader)obj;
            // Process the header
            Console.WriteLine(JsonConvert.SerializeObject(header));
        }
        else
        {
            var row = (PressureMap)obj;
            // Process the row.
            Console.WriteLine(JsonConvert.SerializeObject(row));
        }
    }
}

注意:

  • 这种方法看起来更简单,因为样本是逐步添加到文件末尾的,而不是插入到整个JSON容器中.

  • This approach looks simpler because the samples are added incrementally to the end of the file, rather than inserted inside some overall JSON container.

使用这种方法,可以使用有限的内存来完成序列化和下游处理.

With this approach both serialization and downstream processing can be done with bounded memory use.

示例文件在采样期间不会保持打开状态,因此被截断的可能性较小.

The sample file does not remain open for the duration of sampling, so is less likely to be truncated.

下游应用程序可能没有内置工具来处理换行符分隔的JSON.

Downstream applications may not have built-in tools for processing newline delimited JSON.

此策略可以更简单地与您当前的线程代码集成.

This strategy may integrate more simply with your current threading code.

演示小提琴#2 此处.

这篇关于C#-OutOfMemoryException将列表保存在JSON文件中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆