在连接之间共享事务 [英] Sharing transactions between connections

查看:106
本文介绍了在连接之间共享事务的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个FileShare搜寻器(获取权限并将其放到某个地方以供以后审核).当前,它正在启动多个线程来爬网同一文件夹(以加快进程).

I have a FileShare crawler (getting permissions and dropping them somewhere for later Audit). Currently it is starting multiple threads to crawl the same folder (to speed up the process).

在C#中,每个SqlConnection对象都有自己的SqlTransaction,由SqlConnection.BeginTransaction()调用启动.

In C#, each SqlConnection object has its own SqlTransaction, initiated by the SqlConnection.BeginTransaction() call.

这是当前解决方案的伪代码:

Here is the pseudo code of the current solution:

  1. 获取文件夹列表
  2. 对于每个文件夹,获取子文件夹列表
  3. 对于每个子文件夹,启动一个线程来收集文件共享
  4. 每个线程会将收集到的数据保存到数据库中
  5. 在数据库上运行审核报告

当子文件夹线程之一发生故障时,就会出现此问题.最后,我们进行了部分文件夹扫描,无法轻松检测到".主要原因是每个线程都在单独的连接上运行.

The problem arise when one of the sub folders threads fails. We end up with partial folder scanning which "cannot be detected easily". The main reason is that each thread is running on a separate connection.

我希望每个文件夹都在同一事务中提交,而不要进行不完整的扫描(当前情况,某些线程失败).没有实施任何交易概念,但我正在评估期权.

I would like to have each folder to be committed in the same transaction rather than having incomplete scanning (current situation, when some threads fail). No transaction concept is implemented but I am evaluating the options.

根据此答案的评论,生产者/消费者队列将是一个选择,但不幸的是,内存是一个限制(由于已启动线程的数量).如果将生产者/消费者空间分配给磁盘以克服RAM限制,则执行时间将增加(由于磁盘I/O与内存I/O相比非常有限).我想我在记忆/时间妥协上受了困扰.还有其他建议吗?

Based on the comments of this answer, the producer/consumer queue would be an option but unfortunately memory is a limit (due to the number of started threads). In case the producer/consumer space is committed to disk to overcome the RAM limit, the execution time will go up (due to the very limited disk I/O compared to memory I/O). I guess I am stuck with a memory/time compromise. Any other suggestions?

推荐答案

使用过时的绑定事务功能,可以与SQL Server在多个连接上共享同一事务.我从未使用过它,也不会基于它进行新的开发.这里似乎也没有必要.

It is possible to share the same transaction on multiple connections with SQL Server using the obsolete bind transaction feature. I have never used it and I wouldn't base new development on it. It also seems unnecessary here.

您是否可以让所有生产者都使用相同的连接和事务?在它周围放一个锁.这显然是过程的瓶颈,但它可能仍然足够快.

Can't you just have all the producers use the same connection and transaction? Put a lock around it. This obviously bottlenecks the process but it might still be fast enough.

您说您执行了INSERT语句.对于批量插入,您可以使用SqlBulkCopy类,该类要快得多.分批处理行,仅在缓冲>> 1000行后才执行批量插入.

You say you execute INSERT statements. For bulk inserts you can use the SqlBulkCopy class which is very much faster. Batch up the rows and only execute a bulk insert when you have >>1000 rows buffered.

我什至没有看到生产者/消费者的需要.通过流水线化生产和消耗,确实可以使性能受益,但是它也引入了更为复杂的线程.如果要走这条路线,您可能应该给SqlBulkCopy类提供IEnumerable<SqlDataRecord>,以直接将生成的所有行流式传输到其中,而无需中间缓冲.

I don't even see the need for producer/consumer here. It would indeed benefit performance by pipelining production with consumption but it also introduces far more complex threading. If you want to go this route you should probably give an IEnumerable<SqlDataRecord> to the SqlBulkCopy class to directly stream all rows that have been produced into it without intermediate buffering.

这篇关于在连接之间共享事务的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆