C#数据库中的多个并行插入 [英] C# multiple parallel inserts in database

查看:27
本文介绍了C#数据库中的多个并行插入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大约 3000 行的数据表.这些行中的每一行都需要插入到数据库表中.目前,我正在运行一个 foreach 循环,如下所示:

I have a datatable with around 3000 rows. Each of those rows need to be inserted in a database table. Currently, i am running a foreach loop as under:

obj_AseCommand.CommandText = sql_proc;
obj_AseCommand.CommandType = CommandType.StoredProcedure;
obj_AseCommand.Connection = db_Conn;
obj_AseCommand.Connection.Open();

foreach (DataRow dr in dt.Rows)                
{
    obj_AseCommand.Parameters.AddWithValue("@a", dr["a"]);
    obj_AseCommand.Parameters.AddWithValue("@b", dr["b"]);
    obj_AseCommand.Parameters.AddWithValue("@c", dr["c"]);

    obj_AseCommand.ExecuteNonQuery();
    obj_AseCommand.Parameters.Clear();
}

obj_AseCommand.Connection.Close();

您能否建议我如何在数据库中并行执行 SP,因为上述方法大约需要 10 分钟才能插入 3000 行.

Can you please advise how can I do parallelly execute the SP in database since the above approach takes about 10 minutes to insert 3000 rows.

推荐答案

编辑

事后看来,使用 Parallel.ForEach 并行化数据库插入有点浪费,因为它还会为每个连接消耗一个线程.可以说,更好的并行解决方案是使用 System.Data Db 操作的异步版本,例如 ExecuteNonQueryAsync ,开始执行(并发),然后使用 await Task.WhenAll() 等待完成 - 这将避免调用者的线程开销,尽管整体 Db 性能可能不会更快.更多内容

In hindsight, using a Parallel.ForEach to parallelize DB insertions is slightly wasteful, as it will also consume a thread for each Connection. Arguably, an even better parallel solution would be to use the asynchronous versions of the System.Data Db Operations, such as ExecuteNonQueryAsync , start the executions (concurrently), and then use await Task.WhenAll() to wait upon completion - this will avoid the Thread overhead to the caller, although the overall Db performance won't likely be any quicker. More here

原始答案,多个并行插入数据库

您可以使用 TPL 并行执行此操作,例如特别是 Parallel.ForEachlocalInit 重载.您几乎肯定会希望通过调整 MaxDegreeOfParalelism 来限制并行度,以免淹没数据库:

You can do this in parallel using TPL, e.g. specifically with the localInit overload of Parallel.ForEach. You will almost certainly want to look at throttling the amount of parallelism by tweaking MaxDegreeOfParalelism so that you don't inundate your database:

Parallel.ForEach(dt.Rows,
    // Adjust this for optimum throughput vs minimal impact to your other DB users
    new ParallelOptions { MaxDegreeOfParallelism = 4 },
    () =>
    {
        var con = new SqlConnection();
        var cmd = con.CreateCommand();
        cmd.CommandText = sql_proc;
        cmd.CommandType = CommandType.StoredProcedure;
        con.Open();

        cmd.Parameters.Add(new SqlParameter("@a", SqlDbType.Int));
        // NB : Size sensitive parameters must have size
        cmd.Parameters.Add(new SqlParameter("@b", SqlDbType.VarChar, 100));
        cmd.Parameters.Add(new SqlParameter("@c", SqlDbType.Bit));
        // Prepare won't help with SPROCs but can improve plan caching for adhoc sql
        // cmd.Prepare();
        return new {Conn = con, Cmd = cmd};
    },
    (dr, pls, localInit) =>
    {
        localInit.Cmd.Parameters["@a"] = dr["a"];
        localInit.Cmd.Parameters["@b"] = dr["b"];
        localInit.Cmd.Parameters["@c"] = dr["c"];
        localInit.Cmd.ExecuteNonQuery();
        return localInit;
    },
    (localInit) =>
    {
        localInit.Cmd.Dispose();
        localInit.Conn.Dispose();
    });

注意事项:

  • 除非您真的知道自己在做什么,否则通常我们应该让 TPL 来决定并行度.但是,根据资源争用的程度(读取:数据库工作的锁),可能需要限制并发任务的上限(反复试验可能很有用,例如尝试 4、8、16 个并发任务等并发查看哪个提供了最大的吞吐量,并监控 Sql Server 上的锁定和 CPU 负载.
  • 同样,保留 TPL 的默认分区器通常足以在任务之间对 DataRows 进行分区.
  • 每个任务都需要自己独立的 Sql 连接.
  • 与其在每次调用时创建和释放命令,不如为每个任务创建一次,然后继续重复使用相同的命令,每次只更新参数.
  • 使用 LocalInit/Local finally lambda 来执行每个任务的设置和清理,例如处理命令和连接.
  • 您也可以考虑使用 .Prepare() 如果您使用 AdHoc Sql 或 Sql 之前的版本2005
  • 我假设枚举 DataTable's 行是线程安全的.当然,您需要仔细检查一下.
  • Unless you really know what you are doing, in general we should leave TPL to decide on the degree of parallelism. However, depending on how much contention (read: locks for database work) for resources, restricting the upper limit of concurrent tasks may be required (trial and error may be useful, e.g. try with concurrencies of 4, 8, 16 concurrent tasks etc to see which gives most throughput, and monitor the locking and CPU load on your Sql Server.
  • Similarly, leaving TPL's default partitioner is usually good enough to partition the DataRows across the tasks.
  • Each Task will need its own separate Sql Connection.
  • Rather than creating and disposing the command on each call, create it once per task and then keep reusing the same Command, just updating parameters each time.
  • Use the LocalInit / Local Finally lambdas to do per task set up and cleanup, like Disposing commands and connections.
  • You could also consider using .Prepare() if you are using AdHoc Sql or Sql versions prior to 2005
  • I'm assuming enumerating a DataTable's rows is thread safe. You'll want to double check this of course.

旁注:

3000 行的 10 分钟是过多的,即使是宽表和单线程也是如此.你的proc是做什么的?我假设处理不是微不足道的,因此需要 SPROC,但如果你只是做简单的插入,根据@3dd 的评论,SqlBulkCopy 将在相当窄的表上产生每分钟约 1M 行的插入.

10 minutes for 3000 rows is excessive even with a wide table and a single thread. What does your proc do? I've assumed the processing isn't trivial, hence the need for the SPROC, but if you are just doing simple inserts, as per @3dd's comment, SqlBulkCopy will yield inserts of ~ 1M rows per minute on a reasonably narrow table.

这篇关于C#数据库中的多个并行插入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆