序列化/反序列化大数据集 [英] Serialize/Deserialize Large DataSet
问题描述
我有一个报告工具,可将查询请求发送到服务器.服务器完成查询后,结果将发送回请求报告工具.使用WCF完成通信.
I have a reporting tool that sends query requests to a server. After the query is done by the server the result is sent back to the requesting reporting tool. The communication is done using WCF.
存储在DataSet对象中的查询数据非常大,通常大约100mb大.
The queried data, stored in a DataSet object, is very large and is usually round about 100mb big.
为加快传输速度,我进行了序列化(BinaryFormatter)并压缩了DataSet.服务器和报表工具之间传输的对象是字节数组.
To fasten the transmission I serialize (BinaryFormatter) and compress the DataSet.The transmitted object between the server and reporting tool is a byte array.
但是,在经过几次请求后,报表工具尝试对数据集进行反序列化时会引发OutOfMemoryException.我打电话时会抛出异常:
However after a few requests the reporting tool throws an OutOfMemoryException when it tries to deserialize the DataSet. The exception is thrown when I call:
dataSet = (DataSet) formatter.Deserialize(dstream);
dstream是DeflateStream,用于解压缩传输的压缩字节数组.
dstream is the DeflateStream used to decompress the transmitted compressed byte array.
该异常发生在formatter的子调用中.从流中创建字节数组时反序列化.
The exception occurs in a sub call of formatter.Deserialize when the byte array is created out of the stream.
还有没有其他更好的机制来防止此异常的二进制序列化方式?
Is there any other way of binary serialization that has a better mechanism to prevent this exception?
实施方式:
用于序列化和压缩数据集(由服务器使用)的方法
The method to serialize and compress the DataSet (used by the server)
public static byte[] Compress(DataSet dataSet)
{
using (var input = new MemoryStream())
{
var binaryFormatter = new BinaryFormatter();
binaryFormatter.Serialize(input, dataSet);
using (var output = new MemoryStream())
{
using (var compressor = new DeflateStream(output, CompressionLevel.Optimal))
{
input.Position = 0;
var buffer = new byte[1024];
int read;
while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
compressor.Write(buffer, 0, read);
compressor.Close();
return output.ToArray();
}
}
}
}
用于解压缩和反序列化数据集的方法(由报表工具使用)
The method to decompress and deserialize the DataSet (used by the reporting tool)
public static DataSet Decompress(byte[] data)
{
DataSet dataSet;
using (var input = new MemoryStream(data))
{
using (var dstream = new DeflateStream(input, CompressionMode.Decompress))
{
var formatter = new BinaryFormatter();
dataSet = (DataSet) formatter.Deserialize(dstream);
}
}
return dataSet;
}
堆栈跟踪:
at System.Array.InternalCreate(Void* elementType, Int32 rank, Int32* pLengths, Int32* pLowerBounds)
at System.Array.CreateInstance(Type elementType, Int32 length)
at System.Array.UnsafeCreateInstance(Type elementType, Int32 length)
at System.Runtime.Serialization.Formatters.Binary.ObjectReader.ParseArray(ParseRecord pr)
at System.Runtime.Serialization.Formatters.Binary.ObjectReader.ParseObject(ParseRecord pr)
at System.Runtime.Serialization.Formatters.Binary.ObjectReader.Parse(ParseRecord pr)
at System.Runtime.Serialization.Formatters.Binary.__BinaryParser.ReadArray(BinaryHeaderEnum binaryHeaderEnum)
at System.Runtime.Serialization.Formatters.Binary.__BinaryParser.Run()
at System.Runtime.Serialization.Formatters.Binary.ObjectReader.Deserialize(HeaderHandler handler, __BinaryParser serParser, Boolean fCheck, Boolean isCrossAppDomain, IMethodCallMessage methodCallMessage)
at System.Runtime.Serialization.Formatters.Binary.BinaryFormatter.Deserialize(Stream serializationStream, HeaderHandler handler, Boolean fCheck, Boolean isCrossAppDomain, IMethodCallMessage methodCallMessage)
at System.Runtime.Serialization.Formatters.Binary.BinaryFormatter.Deserialize(Stream serializationStream)
at DRX.PTClientMonitoring.Infrastructure.Helper.DataSetCompressor.Decompress(Byte[] data) in c:\_develop\PTClientMonitoringTool\PTClientMonitoringTool\Source\DRX.PTClientMonitoring.Infrastructure\Helper\DataSetCompressor.cs:line 51
at DRX.PTClientMonitoring.Reporting.ViewModels.ShellViewModel.<>c__DisplayClassf.<ExecudeDefinedQuery>b__4() in c:\_develop\PTClientMonitoringTool\PTClientMonitoringTool\Source\DRX.PTClientMonitoring.Reporting\ViewModels\ShellViewModel.cs:line 347
推荐答案
在序列化之前,设置:
yourDataSet.RemotingFormat = SerializationFormat.Binary;
那应该有很大帮助.即使使用BinaryFormatter
时,默认的也是xml.
That should help a lot. The default even when using BinaryFormatter
is xml.
但是请注意,DataSet
和DataTable
固有地 不是优化的最佳选择.有很多很棒的序列化工具可以更好地打包数据,但它们始终需要强大的类型模型,即List<SomeSpecificType>
,其中SomeSpecificType
是POCO/DTO类.甚至WCF也几乎不能容忍DataTable
/DataSet
.因此,如果您可以摆脱对DataTable
/DataSet
的依赖:我强烈建议您这样做.
Note, however, that DataSet
and DataTable
are inherently not great candidates for optimization. There are a lot of great serialization tools that will do a much better job of packing your data, but they invariable require strong type models, i.e. List<SomeSpecificType>
where SomeSpecificType
is a POCO/DTO class. Even WCF only barely tolerates DataTable
/DataSet
. So if you can get rid of your dependency on DataTable
/DataSet
: I strongly advise doing so.
另一个选择是将数据作为Stream
发送.我很确定WCF本身就支持此功能,但是从理论上讲,这将使您拥有一个实际上更大的不同的Stream
( not MemoryStream
).作为一种便宜的选择,您可以将临时文件用作暂存区域,但是如果可行,您可以研究将多个缓冲区缝合在一起的自定义内存流.
Another option is to send the data as a Stream
. I'm pretty sure WCF supports this natively, but this would in theory allow you to have a different Stream
(not MemoryStream
) that is actually much larger. As a cheap option you could use a temporary file as a scratch area, but if that works you could investigate a custom in-memory stream that stitches multiple buffers together.
这篇关于序列化/反序列化大数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!