如何序列化,而无需创建一个大的缓冲容量的图形.NET对象到SQL Server BLOB? [英] How to I serialize a large graph of .NET object into a SQL Server BLOB without creating a large buffer?

查看:321
本文介绍了如何序列化,而无需创建一个大的缓冲容量的图形.NET对象到SQL Server BLOB?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有code这样的:

We have code like:

ms = New IO.MemoryStream
bin = New System.Runtime.Serialization.Formatters.Binary.BinaryFormatter
bin.Serialize(ms, largeGraphOfObjects)
dataToSaveToDatabase = ms.ToArray()
// put dataToSaveToDatabase in a Sql server BLOB

但内存分配蒸汽从大内存堆是给了我们一个问题的大的缓冲。那么怎样才能流中的数据,而不需要足够的可用内存来容纳序列化对象。

But the memory steam allocates a large buffer from the large memory heap that is giving us problems. So how can we stream the data without needing enough free memory to hold the serialized objects.

我要寻找一种方式来获得的,然后可以传递给bin.Serialize(SQL服务器流)从而避免将所有的数据在我进程的内存。

I am looking for a way to get a Stream from SQL server that can then be passed to bin.Serialize() so avoiding keeping all the data in my processes memory.

同样读取数据回来...

一些更多的背景。

这是处理数据以近乎实时的寻找设备问题等复杂的数字处理系统的一部分,序列化是为了允许重新启动的时候有从数据馈送等数据质量问题(我们店该数据源,并可以重新运行它们之后,运营商已经编辑了坏值。)

This is part of a complex numerical processing system that processes data in near real time looking for equipment problems etc, the serialization is done to allow a restart when there is a problem with data quality from a data feed etc. (We store the data feeds and can rerun them after the operator has edited out bad values.)

因此​​,我们有很多更频繁序列化对象,那么我们反序列化它们。

Therefore we serialize the object a lot more often then we de-serialize them.

我们正在序列化的对象包括非常大的数组大多是双打,以及很多小比较正常的对象。我们正在推动在32位系统的内存限制,使车库收集工作很辛苦。 (效果正在作出的其他地方系统,以改善这一点,比如重用大型阵列而不是创造新的阵列。)

The objects we are serializing include very large arrays mostly of doubles as well as a lot of small "more normal" objects. We are pushing the memory limit on a 32 bit system and make the garage collector work very hard. (Effects are being made elsewhere in the system to improve this, e.g. reusing large arrays rather then create new arrays.)

通常状态的序列化是最后一根稻草的那场,一个内存溢出异常;我们的峰值内存使用量,而这个序列正在做。

Often the serialization of the state is the last straw that courses an out of memory exception; our peak memory usage is while this serialization is being done.

我的认为的我们得到大的内存池碎片,当我们反序列化对象,我希望也有其他的问题,给出了数组的大小大的内存池碎片。 (这还没有被查处,作为第一次看这个人是一个数字处理的专家,而不是存储管理专家。)

I think we get large memory pool fragmentation when we de-serialize the object, I expect there are also other problem with large memory pool fragmentation given the size of the arrays. (This has not yet been investigated, as the person that first looked at this is a numerical processing expert, not a memory management expert.)

是客户使用SQL Server 2000,2005和2008的组合,我们宁可不要为每个版本的SQL Server如果可能的话。

Are customers use a mix of Sql Server 2000, 2005 and 2008 and we would rather not have different code paths for each version of Sql Server if possible.

我们可以在同一时间许多活跃的模型(在不同的过程中,在很多机),每个模型可以有许多备用状态。因此,保存的状态被存储在数据库中的blob而不是一个文件。

We can have many active models at a time (in different process, across many machines), each model can have many saved states. Hence the saved state is stored in a database blob rather then a file.

由于保存状态的S $ P $垫是很重要的,我宁愿不序列化对象到一个文件,然后把文件中的BLOB一个块的时间。

As the spread of saving the state is important, I would rather not serialize the object to a file, and then put the file in a BLOB one block at a time.

其他相关问题我也问过

  • <一个href="http://stackoverflow.com/questions/2101346/how-to-stream-data-from-to-sql-server-blob-fields">How从/到SQL Server BLOB字段数据流?
  • <一个href="http://stackoverflow.com/questions/2116291/is-there-a-sqlfilestream-like-class-that-works-with-sql-server-2005">Is有喜欢类SqlFileStream与SQL Server 2005?
  • 工作
  • How to Stream data from/to SQL Server BLOB fields?
  • Is there a SqlFileStream like class that works with Sql Server 2005?

推荐答案

没有内置ADO.Net的功能来处理这种真正优雅的大数据。问题是双重的:

There is no built-in ADO.Net functionality to handle this really gracefully for large data. The problem is two fold:

  • 在没有API来写成一个SQL命令(S)或参数成流。接受流(像的FileStream)参数类型接受流为 READ 从它,它不与的的序列化语义到流认同。不,你把这个哪种方式件事,你最终会得到一个在整个序列化对象的内存拷贝,坏了。
  • ,即使上面的点将得到解决(并且它不能是),TDS协议和方式的SQL Server接受参数不与大参数的工作以及整个请求面前有射入执行首先要被接收而这将创建对象的aditional的副本SQL Server内部。
  • there is no API to 'write' into a SQL command(s) or parameters as into a stream. The parameter types that accept a stream (like FileStream) accept the stream to READ from it, which does not agree with the serialization semantics of write into a stream. No matter which way you turn this, you end up with a in memory copy of the entire serialized object, bad.
  • even if the point above would be solved (and it cannot be), the TDS protocol and the way SQL Server accepts parameters do not work well with large parameters as the entire request has to be first received before it is launched into execution and this would create aditional copies of the object inside SQL Server.

所以,你真的需要从不同的角度来处理这个。幸运的是,有一个相当简单的解决方案。诀窍是使用高效UPDATE .WRITE语法和传递数据的一个的块一个,在一系列的T-SQL语句。这是MSDN推荐的方式,请参见修改大值(Max )数据在ADO.NET 。这看起来很复杂,但实际上是容易做到,并插入一个Stream类。

So you really have to approach this from a different angle. Fortunately, there is a fairly easy solution. The trick is to use the highly efficient UPDATE .WRITE syntax and pass in the chunks of data one by one, in a series of T-SQL statements. This is the MSDN recommended way, see Modifying Large-Value (max) Data in ADO.NET. This looks complicated, but is actually trivial to do and plug into a Stream class.

的BlobStream类

这是该解决方案的面包和奶油。甲流派生类实现了Write方法的调用T-SQL BLOB写语法。简单的,唯一有趣的是,它必须保持的第一个更新的曲目,因为 UPDATE ... SET blob.WRITE(...)语法将失败的一个NULL字段:

This is the bread and butter of the solution. A Stream derived class that implements the Write method as a call to the T-SQL BLOB WRITE syntax. Straight forward, the only thing interesting about it is that it has to keep track of the first update because the UPDATE ... SET blob.WRITE(...) syntax would fail on a NULL field:

class BlobStream: Stream
{
    private SqlCommand cmdAppendChunk;
    private SqlCommand cmdFirstChunk;
    private SqlConnection connection;
    private SqlTransaction transaction;

    private SqlParameter paramChunk;
    private SqlParameter paramLength;

    private long offset;

    public BlobStream(
        SqlConnection connection,
        SqlTransaction transaction,
        string schemaName,
        string tableName,
        string blobColumn,
        string keyColumn,
        object keyValue)
    {
        this.transaction = transaction;
        this.connection = connection;
        cmdFirstChunk = new SqlCommand(String.Format(@"
UPDATE [{0}].[{1}]
    SET [{2}] = @firstChunk
    WHERE [{3}] = @key"
            ,schemaName, tableName, blobColumn, keyColumn)
            , connection, transaction);
        cmdFirstChunk.Parameters.AddWithValue("@key", keyValue);
        cmdAppendChunk = new SqlCommand(String.Format(@"
UPDATE [{0}].[{1}]
    SET [{2}].WRITE(@chunk, NULL, NULL)
    WHERE [{3}] = @key"
            , schemaName, tableName, blobColumn, keyColumn)
            , connection, transaction);
        cmdAppendChunk.Parameters.AddWithValue("@key", keyValue);
        paramChunk = new SqlParameter("@chunk", SqlDbType.VarBinary, -1);
        cmdAppendChunk.Parameters.Add(paramChunk);
    }

    public override void Write(byte[] buffer, int index, int count)
    {
        byte[] bytesToWrite = buffer;
        if (index != 0 || count != buffer.Length)
        {
            bytesToWrite = new MemoryStream(buffer, index, count).ToArray();
        }
        if (offset == 0)
        {
            cmdFirstChunk.Parameters.AddWithValue("@firstChunk", bytesToWrite);
            cmdFirstChunk.ExecuteNonQuery();
            offset = count;
        }
        else
        {
            paramChunk.Value = bytesToWrite;
            cmdAppendChunk.ExecuteNonQuery();
            offset += count;
        }
    }

    // Rest of the abstract Stream implementation
 }


使用BlobStream

要使用这个新创建的你插一滴流类成BufferedStream。这个类有一个简单的设计,只处理将数据写入流成表的列。我将利用一个表从另一个例子:

To use this newly created blob stream class you plug into a BufferedStream. The class has a trivial design that handles only writing the stream into a column of a table. I'll reuse a table from another example:

CREATE TABLE [dbo].[Uploads](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [FileName] [varchar](256) NULL,
    [ContentType] [varchar](256) NULL,
    [FileData] [varbinary](max) NULL)

我将添加一个虚拟对象进行序列化:

I'll add a dummy object to be serialized:

[Serializable]
    class HugeSerialized
    {
        public byte[] theBigArray { get; set; }
    }

最后,实际的序列化。我们先插入一个新的记录到上传表,然后创建新插入的ID的BlobStream并调用序列直入此流:

Finally, the actual serialization. We'll first insert a new record into the Uploads table, then create a BlobStream on the newly inserted Id and call the serialization straight into this stream:

using (SqlConnection conn = new SqlConnection(Settings.Default.connString))
            {
                conn.Open();
                using (SqlTransaction trn = conn.BeginTransaction())
                {
                    SqlCommand cmdInsert = new SqlCommand(
@"INSERT INTO dbo.Uploads (FileName, ContentType)
VALUES (@fileName, @contentType);
SET @id = SCOPE_IDENTITY();", conn, trn);
                    cmdInsert.Parameters.AddWithValue("@fileName", "Demo");
                    cmdInsert.Parameters.AddWithValue("@contentType", "application/octet-stream");
                    SqlParameter paramId = new SqlParameter("@id", SqlDbType.Int);
                    paramId.Direction = ParameterDirection.Output;
                    cmdInsert.Parameters.Add(paramId);
                    cmdInsert.ExecuteNonQuery();

                    BlobStream blob = new BlobStream(
                        conn, trn, "dbo", "Uploads", "FileData", "Id", paramId.Value);
                    BufferedStream bufferedBlob = new BufferedStream(blob, 8040);

                    HugeSerialized big = new HugeSerialized { theBigArray = new byte[1024 * 1024] };
                    BinaryFormatter bf = new BinaryFormatter();
                    bf.Serialize(bufferedBlob, big);

                    trn.Commit();
                }
            }


如果您监视这个简单的示例执行,你会发现无处是创建一个大的序列化流。样品将拨出[1024 * 1024]数组但这是用于演示目的有东西seralize。在缓冲的方式,逐块这code seralizes,使用<打击> 8060 8040个字节的SQL Server的BLOB recommeneded更新尺寸的时间。


If you monitor the execution of this simple sample you'll see that nowhere is a large serialization stream created. The sample will allocate the array of [1024*1024] but that is for demo purposes to have something to seralize. This code seralizes in a buffered manner, chunk by chunk, using the SQL Server BLOB recommeneded update size of 8060 8040 bytes at a time.

这篇关于如何序列化,而无需创建一个大的缓冲容量的图形.NET对象到SQL Server BLOB?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆