用螺纹执行大批量插入 [英] Executing high volume inserts with threads

查看:80
本文介绍了用螺纹执行大批量插入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我正在尝试执行插入到两个不同表中的两个不同线程.我遇到的问题是我的整个CPU使用率都被消耗掉了(100%)...我绝对可以确定我编写的代码中充满了错误和不正确.有人可以解释/告诉我解决此问题的正确方法吗?

谢谢!

-唐纳德​​

这是示例代码:



I''m trying to execute two different threads that are inserting into two different tables. The problem I''m having is that my entire CPU usage is being consumed (100%)...I''m definitely sure that the code I''ve written is full of errors and incorrect. Can someone please explain/show me the correct way to go about this.

Thank you!

-Donald

here''s the sample code:

Data Tier:

    class DataAccessLayer
    {
        public SqlConnection conn;
        public SqlCommand cmd;
        public int bidPrice;
        public int askPrice;
        public int lastPrice;

        public DataAccessLayer()
        {
            conn = new SqlConnection("Server = localhost; Database = testdb; uid = root; pwd = password; port = 0000;");
            cmd = new SqlCommand();
            try
            {
                conn.Open();
                cmd.Connection = conn;
            }
            catch
            {
                throw new Exception("Connection Error...");
            }
        }

        public void DataManipulation(object str)
        {
            string s = (string)str;
            List<string> data = new List<string>(s.Split('',''));
            bidPrice = int.Parse(data[0]);
            askPrice = int.Parse(data[1]);
            lastPrice = int.Parse(data[2]);
        }

        public void Insert()
        {
            lock (this)
            {
                try
                {
                    cmd.CommandText = string.Format("INSERT INTO test_tbl VALUES (NULL,{0},{1},{2});", bidPrice, askPrice, lastPrice);
                    cmd.ExecuteNonQuery();
                }
                catch (SqlException ex)
                {
                    throw ex;
                }
            }
        }

        public void Update()
        {
            lock (this)
            {
                try
                {
                    cmd.CommandText = string.Format("INSERT INTO test_tbl2 VALUES (NULL,{0},{1},{2});", bidPrice, askPrice, lastPrice);
                    cmd.ExecuteNonQuery();
                }
                catch (SqlException ex)
                {
                    throw ex;
                }
            }
        }
    }



主要代码:



Main code:

public partial class Form1 : Form
    {
        DataAccessLayer dataAccess;
        Thread updateThread;
        Thread insertThread;
        Thread dataThread;
        string dataString;

        public Form1()
        {
            InitializeComponent();
        }
        private void Form1_Load(object sender, EventArgs e)
        {
            try
            {
                dataAccess = new DataAccessLayer();
            }
            catch(Exception ex)
            {
                MessageBox.Show(ex.Message);
            }
        }
        private void onQuote()
        {
            while (true)
            {
                Random rnd = new Random();
                int bid = rnd.Next(10);
                int ask = rnd.Next(10);
                int last = rnd.Next(10);
                dataString = string.Format("{0},{1},{2}", bid, ask, last);
                updateThread = new Thread(new ThreadStart(dataAccess.Update));
                insertThread = new Thread(new ThreadStart(dataAccess.Insert));
                dataThread = new Thread(new ParameterizedThreadStart(dataAccess.DataManipulation));
                dataThread.Priority = ThreadPriority.AboveNormal;
                dataThread.Start(dataString);
                insertThread.Start();
                updateThread.Start();
            }
        }
        private void btnStart_Click(object sender, EventArgs e)
        {
            onQuote();
        }
    }

推荐答案

难怪您会遇到CPU消耗问题.我还认为单击btnStart后您的UI会永远挂着,对吗?

我感谢您成熟的理解,即您的设计目前还远远不够完美.我可以告诉你,没有比这更正确的假设了:).

首先,我可以说方法onQuote本身应该运行的是单独的线程.并且此线程比运行UpdateInsertDataManipulation的线程更重要;它还需要终止条件.但是下一步,我们需要了解是否完全需要像onQuite这样的方法-我认为不是.

首先,您尝试创建不需要的并行性.另外,您还创建了不可预测的线程数:您创建了无条件线程.尝试计算当前处于活动状态的线程数-结果是不可预测的.想象一下,从线程开始到结束的时间比onQuote的循环快(这是现实的).创建线程的速度更快,然后它们退出处理.这意味着实际上,您将创建无限数量的线程,直到使OS承受超出正常操作限制的资源压力为止.这不是100%的CPU消耗,更糟糕的是.但是真正令我惊讶的是:您是否真的采用了并行性并获得了任何性能?充其量只是在最开始的时候.大多数情况下,由于系统资源过度消耗而使速度变慢(创建线程太昂贵,以致于无法循环运行此API).

在我们进行下一步之前,让我提前给您一些提示:
我想说,您需要固定数量的线程;并且您需要创建大约3 * N个线程,其中N是表的数量.这是非常近似的合理估计;并且这种线程安排可能会根据您的任务细节而有所不同.另外,您需要在开始时创建所有线程,并且只启动一次.线程应仅在其输入数据馈入线程后才保持睡眠和唤醒状态.这意味着您需要使用一些同步原语(建议使用EventWaitHandle)以及它们之间的数据队列.原则上,您可以使用某些消息传递系统和/或WCF之类的选项,但是我希望手动设计这些数据队列和同步-不太难.这看起来很复杂,但让我解释一下:另一种方法是让您的线程摆脱束缚,消耗掉周围的一切.

另一个问题是线程优先级.通常,建议不要触摸它.这无济于事.对于您而言,这当然是正确的规则.

在进一步详细介绍之前,让我告诉您一个问题.看来您通过一些实验性玩具问题为一些明智的任务建模.您认为它足以证明和评估这些技术吗?它不是.这足以说明疯狂使用所有线程,但对于更平衡的设计,最好概述更现实的目标.我不知道我希望您对实际的应用程序目标有所帮助.只要考虑一下,我们就可以继续进行详细介绍...



当其他人提出一些类似的问题时,我决定添加另一个答案,这次是使用一些代码.它将提供我上面提到的线程间通信的一些详细信息.

谢谢.
--SA
No wonder you have problem of CPU consumption. I also think that your UI is hanging forever after btnStart clicked, is that right?

I appreciate you mature understanding that your design is far from perfect at the moment. I can tell you could not make more correct assumption than that :) .

To give you a first hint, I can tell that the method onQuote itself should run is separate thread; and this thread is more important that your threads running Update, Insert and DataManipulation; it also need a condition for termination. But as a next step, we need to understand if we need the method like onQuite at all -- I don''t think so.

First of all, you try to create unwanted parallelism. Also you created unpredictable number of threads: you created them with not condition. Try to count how many thread are currently active -- result is unpredictable. Imagine the time between start and finish of your threads is faster than the loop of onQuote (which can be realistic). You create thread faster then they are exit their processing. It means you will create infinite number of thread, in practice, until you get your OS into resource stress beyond normal operational limits. It''s not 100% CPU consumption, it''s worse. But what really amaze me is: do you really employ the parallelism and gain any performance? Only in the very beginning at best; most of the time you slow down due to over-consumption of your system resources (creation of the thread along is too expensive to run this API in cycle).

Before we go any further, let me give you few hints in advance:
I would say, you need fixed number of threads; and in you need to created about 3*N thread, where N is the number of tables. This is very approximate reasonable estimate; and such thread arrangement is may vary depending of your task detail. Also, you need to creates all threads at the very beginning and start only once. Threads should be kept sleeping and wake up only when their input data is fed to them. It means you need to use some synchronization primitives (I would suggest EventWaitHandle) and the queues of data between them. In principle, you can use some messaging system and/or such option as WCF, but I would prefer to design those data queues and synchronization by hand -- it''s not too hard. It looks like a complication, but let me explain: the alternative will be letting your thread off the leash and consuming everything around.

Another issue is thread priority. Generally, it is recommended not to touch it. It would not help. In your case, this is certainly a correct rule.

Before going to further detail, let me tell you about one problem. It looks like you model some sensible task with some experimental toy problem. Do you think it it realistic enough for demonstration and evaluation of the techniques? It is not. It''s good enough for demonstration of crazy consuming-all threading, but for more balanced design it''s good to overview more realistic goal. I don''t know what. I would prefer you lean a bit to your real-life application goals. Just think about it, then we can continue with some detail...



As other people asked somewhat similar questions, I decided to add another answer, this time with some code. It will provide some details on inter-thread communications I mentioned above.

Thank you.
--SA


提供了唐纳德(Donald)提出的澄清要求(嗯,我看不到他的其他问题),我想继续编写一些代码.

我发现其他一些询问者以类似的方式问我可以回答的问题,因此我决定以某种形式提供解决方案的一部分,如果有一些用于线程间通信的通用队列类.

请在Tpis& amp;中查看我的微型作品.技巧:
用于线程通信和线程间调用的简单阻塞队列 [ ^ ].

它同时涵盖两种类型的问题

1)如何使用固定数量的永久"应用程序线程而不使它们中的任何一个空闲运行?这应该显示如何解决100%CPU消耗的问题(更确切地说,在没有等待/睡眠条件的情况下运行一个线程通常会消耗100%的CPU内核).

2)如何实现类似于System.Windows.Forms.Control.InvokeSystem.Windows.Threading.Dispatcher所使用的线程间调用技术.
Provided my clarification asked by Donald is accepted (well, I don''t see further questions from him), I want to proceed with some code.

I found that some other Inquirers ask questions I can answer in a very similar way, I decided to provide part of the solution in the form if some generic queue class for inter-thread communication.

Please see my miniature work in Tpis & Tricks:
Simple Blocking Queue for Thread Communication and Inter-thread Invocation[^].

It covers two types of issues at the same time

1) How to work with fixed number of "permanent" application threads without idle run of any of them? This should show how to solve a problem of 100% CPU consumption (more exactly, running one thread without wait/sleep condition would typically consume 100% of one of CPU cores).

2) How to implement a technique of inter-thread Invoke similar to that used by System.Windows.Forms.Control.Invoke or System.Windows.Threading.Dispatcher.


唐纳德,

我认为您应该自己购买面向模式的软件体系结构:并行和联网对象的模式(POSA2) [ ^ ],以及 Douglas C. Schmidt [
.NET框架中实现了许多POSA模式,这些书将帮助您决定和理解哪种机制最适合解决您正在研究的问题.

我在
>如何执行中概述的解决方案大容量插入并同时查询表 [ Sarbanes–Oxley法案 [ ^ ]和其他法律事项.

接下来要看的可能是财务信息交换(FIX)协议 [ http://www.quickfixengine.org/ [ Len Silverston [ ^ ]为参考模型提供可靠的记录.


问候
Espen Harlinn
Hi Donald,

I think you should buy youself a copy of Pattern-Oriented Software Architecture: Patterns for Concurrent and Networked Objects (POSA2)[^], and The ACE Programmer''s Guide: Practical Design Patterns for Network and Systems Programming[^]. The latter is C++ oriented, but it will show you how to build flexible and robust software.

The POSA books are considered among the most important works on software patterns, and you can think of a pattern as a proven recipe for building software that does what it''s supposed to do.

ACE is a C++ library, and IMHO one of the finer examples of software engineering. ACE is an open source project lead by Douglas C. Schmidt[^]. Over the years several thousand developers has contributed to this framework. ACE (and TAO) shows how patterns can be used to create "real" high performance, robust and maintainable software.

Many of the POSA patterns are implemented in the .Net framework, and the books will help you decide on, and understand, which mechanisms are best suited to solve the problems you are working on.

The solution I outlined in How to execute high volume inserts and query the table simultaneously[^], Leverages .Net worker threads; which is, as Nishant Sivakumar mentioned, a suitable mechanism for solving your problem.

From what I can see from your code, it seems like you are working on some kind of trading system.

If it’s for real, as in money will change hands, you have a long journey ahead of you. If this is the case - I suggest you start with reading up on the Sarbanes–Oxley Act[^] and other legal matters.

Next thing to look at is probably the Financial Information eXchange (FIX) Protocol[^], a messaging standard developed specifically for the real-time electronic exchange of securities transactions. You''ll find an implementation at http://www.quickfixengine.org/[^], it can be used with .Net.

At this point things might seem a bit overwhelming, you need some kind of data model to build upon, because the complexity of the problem is growing exponentially. Len Silverston[^] provides reference models with a proven track record.


Regards
Espen Harlinn


这篇关于用螺纹执行大批量插入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆