SqlBulkCopy和DataTable的性能问题 [英] Performance issue with SqlBulkCopy and DataTable

查看:574
本文介绍了SqlBulkCopy和DataTable的性能问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要有效地将大量数据从文件导入数据库。
我很少有包含该数据的rrf文件,文件的大小可能大于400mb,最终从文件到数据库的记录可能大于200万。

I need to efficiently import large amount of data from file to database. I have few rrf file which contain that data, the size of a file could be > 400mb and eventually it could be > 2 million record to database from file.

我做了什么:


  1. 我正在读取DataTable中所需的记录。

  1. I am reading needed records in DataTable.

using (StreamReader streamReader = new StreamReader(filePath))
{
    IEnumerable<string> values = new List<string>();

    while (!streamReader.EndOfStream)
    {
        string line = streamReader.ReadLine().Split('|');

         int index = 0;
         var dataRow = dataTable.NewRow();

         foreach (var value in values)
         {
            dataRow[index] = value;
            index++;
         }

         dataTable.Rows.Add(dataRow);
    }
}


  • 在交易中(这很关键点),我将数据从 DataTable 通过 SqlBulkCopy 插入数据库。

    var bcp = new SqlBulkCopy(_sqlConnection, SqlBulkCopyOptions.Default, transaction);
    bcp.DestinationTableName = tableName;          
    bcp.WriteToServer(dataTable);
    


  • 问题在于,由于每个 DataTable 可能包含超过200万条记录,我为 DataTable 存储占用了更多RAM(大约2 GB)。

    The problem is that since each DataTable could contain more than 2 million records, I takes to much RAM (around 2 GB) for DataTable storage.

    类似

    dataTable.Dispose();
    dataTable = null;
    

    GC.Collect();
    GC.SuppressFinalize();
    

    实际上没有帮助。

    SqlBulkCopy Batchsize 属性无关有了它,所有内存都由 DataTable 占用,该数据表存储应插入的行。

    The Batchsize property of SqlBulkCopy has nothing to do with it, all memory is taken by the DataTable which stores rows that should be inserted.

    我想知道是有没有有效的方法来读取数据并使用 SqlBulkCopy

    I wonder, is there efficient way to read data and use SqlBulkCopy with it?

    推荐答案

    以我的经验,批量插入的最佳DataTable大小在60,000行和100,000行之间。另外,我发现重用DataTable比克隆新的DataTable要慢。 DataTable.Rows.Clear()不会清除约束,在第一次批量插入之后添加新行要慢得多。 DataTable.Clear()更好,但是从一个新的DataTable开始,每个批量都是最快的。

    In my experience, the optimal DataTable size for bulk inserting is somewhere between 60,000 rows and 100,000 rows. Also, I found reusing a DataTable to be slower than cloning a fresh one. DataTable.Rows.Clear() doesn't clear the constraints, and adding new rows is much slower after the first bulk insert. DataTable.Clear() is much better, but starting with a fresh DataTable each bulk was the fastest.

    所以您的代码看起来像:

    So your code would look like:

    int batchSize = 65000;
    bool lastLine = streamReader.EndOfStream;
    
    if (dataTable.Rows.Count == batchSize || lastLine) {
        // do bulk insert
        DataTable temp = dataTable.Clone();
        dataTable.Dispose();
        dataTable = temp;
    }
    

    除此之外,您还可以将批量插入插入其自己的线程中。因此,您的文件读取线程将产生DataTable对象,而您的批量插入线程将使用该对象。您必须添加信号量以确保您的文件读取线程不会过度生产,否则您将使用过多的内存。

    In addition to that, you can separate the bulk inserting into its own thread. So your file reading thread would produce DataTable objects that your bulk insert thread would consume. You would have to add semaphores to make sure your file reading thread doesn't over-produce, otherwise you will use too much memory.

    以下是Produce /消费代码。随时对其进行改进。

    Here's an example of the produce/consume code. Feel free to make improvements to it.

    您可以试一下睡眠时间,以查看代码在生产方或消费者方的等待方式。

    You can play around with the sleep times to see how the code waits either on the producer side or the consumer side.

    public static void produce() {
    
        DataObject o2 = new DataObject();
        Thread t = new Thread(consume);
        t.Start(o2);
    
        for (int i = 0; i < 10; i++) {
            if (o2.queue.Count > 2) {
                lock(o2.sb)
                    o2.sb.AppendLine("3 items read, waiting for consume to finish");
    
                o2.wait.Set();
                o2.parentWait.WaitOne();
                o2.parentWait.Reset();
            }
    
            Thread.Sleep(500); // simulate reading data
    
            lock(o2.sb)
                o2.sb.AppendLine("Read file: " + i);
    
            lock(o2.queue) {
                o2.queue.Add(i);
            }
            o2.wait.Set();
        }
    
        o2.finished = true;
        o2.wait.Set();
    }
    
    public class DataObject {
        public bool finished = false;
        public List<int> queue = new List<int>();
        public ManualResetEvent wait = new ManualResetEvent(false);
        public ManualResetEvent parentWait = new ManualResetEvent(false);
        public StringBuilder sb = new StringBuilder();
    }
    
    public static void consume(Object o) {
        DataObject o2 = (DataObject) o;
    
        while (true) {
            if (o2.finished && o2.queue.Count == 0)
                break;
    
            if (o2.queue.Count == 0) {
                lock(o2.sb)
                    o2.sb.AppendLine("Nothing in queue, waiting for produce.");
                o2.wait.WaitOne();
                o2.wait.Reset();
            }
    
            Object val = null;
            lock(o2.queue) {
                val = o2.queue[0];
                o2.queue.RemoveAt(0);
            }
    
            o2.parentWait.Set(); // notify parent to produce more
    
            lock(o2.sb)
                o2.sb.AppendLine("Loading data to SQL: " + val);
    
            Thread.Sleep(500);
        }
    }
    

    这篇关于SqlBulkCopy和DataTable的性能问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆