在realTime中处理/计算数据 [英] Processing/calculating data in realTime

查看:63
本文介绍了在realTime中处理/计算数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述





我试图处理实时stockMarket数据并插入和更新结果数据。我正在使用我已经线程化的消费者生产者队列设计模式。



一些计算非常密集并且降低了数据库的性能。我似乎无法想象如何处理,插入/更新数据库的结果。



有些人可以给我一些如何设置的建议这个正确吗?



谢谢,

-Donald

Hi,

I trying to process live stockMarket data and insert and update a data with the results. I'm using the consumer producer queue design pattern which I have threaded.

Some of the calculations are VERY intensive and degrading the performance of the database. I can't seem to figure how to go about processing, inserting/updating the database with the results.

Can some please give me advice on how to go about setting this up properly?

Thanks,
-Donald

推荐答案

这个这是一个非常普遍的想法。如果,并且它是一个重要的if,部分缓慢是由托管代码引起的,您可能希望将一些更密集的计算移动到快速C ++编写的库中。您可以通过COM或C ++ / CLI(以及其他选项)调用它。
This is a very general idea I am throwing in. If, and it's an important "if", part of the slowness is due to the managed code, you may want to move some of the more intensive calculations into a fast C++ written library. You could call into it via COM or C++/CLI (among other options).


我可以看到您的繁重计算部分可能会损害系统的总吞吐量,但我不会了解为什么它必须降低数据库的性能。瓶颈是什么:计算本身或中间结果的额外交易?如果交易成为瓶颈,您需要兑现数据。我无法相信你无论如何都可以正确计算不断变化的数据库。



如果你已经开发了消费者/生产者队列方法,你可以或多或少轻松地移动大处理到另一台机器的一部分。我建议你专门为你的计算部分设一个单独的层。它可以在一台单独的机器上运行并增加并行性。



-SA
I can see that your heavy calculation part could compromise the total throughput of the system, but I don't see why it has to degrade the performance of the database. What is the bottleneck: the calculations themselves or additional transactions for intermediate results? If the transaction make a bottleneck you need to cash the data. I cannot believe you can do correct calculation of an ever-changing database anyway.

If you already developing the consumer/producer queue approach you can more or less easily move big part of processing onto another machine. I would suggest you dedicate a separate tier just for your calculation part. It can run on a separate machine and increase parallelism.

—SA


因为我有在大学里做过类似的事情,我想我知道你的问题在哪里。

例如,我在sql开发者机器上的几十万个数据集上做了一些测试(c#)。与使用内存和简单文件存储的perl解决方案相比,性能非常缓慢。



我记得,一个周末我的多线程应用程序阻止了整个多核系统和大学骨干。我之前写的这个perl程序是从世界各地的服务器获取库存数据,一次又一次地比较数TB的数据,提取,过滤,完成外推数据甚至处理一些图像以进行可视化。我可以说的是,一个设计良好的程序根本没有数据库,由精心选择的脚本编译器(如perl(以快速解析功能而闻名))解释,可以轻松地胜过任何预编译的高级托管代码应用程序。就像为某项任务选择合适的工具一样。



从我目前的观点来看,对于这种应用(高数据,高访问,复杂的操作 - 我称之为hidaco - 在我的情况下是图像处理),数据库编程的标准方法是NO-GO!我个人认为数据库性能被高估了。虽然由于可靠性,财务方式最常被用于交易模型,但在性能考虑方面,这是致命的选择。好吧,我的方法是将数据库活动减少到最小(意味着零,我自己编写)。对于您来说,这意味着要进行一些缓存,也许可以创建自己的数据库,或者更好,考虑使用内存数据库(请参阅Google)。对于像神经网络和ai(如aforge或opencv)这样的递归计算比(明确定义的和确定性的)金融数学要强得多,计算(IMHO)不是瓶颈,也不是托管代码。任何SQL都可能成为瓶颈非常容易。尝试至少两个内存数据库(请参阅wiki上的imdb以获取列表)。如果你的性能提高了,你应该重新设计你的sql语句以达到最大化。好吧,我敢打赌它会大幅增加,但如果没有,请采用c ++方式(使用外部金融数学库和c#包装器)进行性能测试。



另一种方法是扩展SQL-Server / Database功能。有一段YouTube视频介绍了YouTube在不同时期的增长过程中的尺码问题 - 只是一个提示,但带我到最后一点;-)



最后一个字在共同的坑下跌。我假设处理实时库存数据意味着通过任何类型的网络获取数据!请注意连接处理的任何限制,从最大同时连接/插座/端口,带宽问题,数据包/会话超时和错误配置(即使在物理方面 - >网络)开始,以及可能出现的任何情况。



最后但并非最不重要的,请告诉我们。
For I have done similar things at university, I think I know where your problem is.
For instance, I did some testing (c#) on just a few hundred thousands of datasets on a sql developer machine. The performance was damn slow compared with a perl solution using in-memory and simple file based storage.

I remember, one weekend my multithreaded app was blocking the whole multicore system and university backbone. This perl program I wrote some time ago was fetching stock data from servers around the world comparing terabytes of data again and again, extracting, filtering, completing extrapolating data and even processing some images for visualization. One thing I can tell is, that a well-designed program with no database at all, interpreted by a well-chosen script compiler like perl (which is known for fast parsing capability), can outperform any precompiled high level managed code application easily. It's like choosing the right tools for a certain task.

From my current point of view, for this kind of application (high data, high access, complex operations - I call it hidaco - and in my case image processing), a standard approach of database programming is a NO-GO! Personally I think Database performance is well overestimated. Though financial manners are most often taken into transaction models because of reliability, this is fatal choice when it comes to performance considerations. Well, my approach was to reduce database activity to the minimum (means zero, I wrote my own). For you, that means doing some caching and maybe kind of creating your own database, or better, consider using an in-memory database (see Google). For recurs computations like neural networks and ai (like aforge or opencv) are much more intense than (well defined and deterministic) financial math, computation is (IMHO) not a bottleneck, nor is managed code. Any SQL may become a bottleneck very easy. Try at least two in-memory databases (see imdb on wiki for a list). If your performance increases, you should redesign your sql statements to get to the max. Well, I bet it will tremendously increase, but if it does not, take the c++ way (use an externally financial math library with c# wrapper) for performance testing.

Another approach would be to expand your SQL-Server / Database capabilities. There is a YouTube video about YouTube’s sizing problems during different periods of growth out there - just a hint, but takes me to the last point ;-)

One last word on common pit falls. I assume that processing live stock data means fetching data over any kind of network!? Please be aware of any limits on connection handling starting with maximum simultaneous connections/ sockets/ ports, bandwidth issues, packet-/session-timeouts and misconfiguration (even on the physical side -> network) and whatever may come.

And last but not least, let us know.


这篇关于在realTime中处理/计算数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆