将大型单片单线程应用程序转换为多线程架构的建议? [英] Advice for converting a large monolithic singlethreaded application to a multithreaded architecture?

查看:131
本文介绍了将大型单片单线程应用程序转换为多线程架构的建议?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我公司的主要产品是大型单片C ++应用程序,用于科学数据处理和可视化。它的代码可以追溯到12或13年,而我们已经将工作升级和维护(使用STL和Boost) - 当我加入大多数容器时,例如 - 完全升级到Unicode和2010 VCL等)还有一个非常重要的问题:它完全是单线程的。鉴于这是一个数据处理和可视化程序,这越来越成为一个障碍。



我都是一个开发人员下一个版本的项目经理,我们要解决这个问题,这在这两个方面都是一个困难的工作。我正在寻求有关如何解决问题的具体,实用和架构建议。



程序的数据流可能会如下所示:




  • 一个窗口需要绘制数据

  • 在paint方法中,它将调用GetData方法,在一次绘画操作中经常需要数百次数百位数据。

  • 这将从文件或其他所需的任何内容中进行计算或读取(通常是一个复杂的数据流 - 想到这是数据流经复杂图形,其每个节点都执行操作)



即,在处理过程中,绘画消息处理程序将阻止完成,如果数据尚未计算和缓存,这可能很长一段时间。有时这是几分钟。对于执行冗长处理操作的程序的其他部分,也会出现类似的路径 - 程序在整个时间(有时为小时)无响应。



我正在寻求有关如何改变这个方法实用观念也许是这样的:




  • 异步请求数据的设计模式?

  • 存储大量的对象集合那些线程可以安全地读写?

  • 在某些东西尝试读取数据集时,处理无效的数据集?

  • 有这样的模式和技巧有什么问题吗?

  • 请问我没有想到什么?



我几年前的Uni日子还没有完成任何多线程编程,我认为我的其他小组处于类似的位置。我所知道的是学术性的,不实际的,并且无法接近这种信心。



最终目标是要有一个完全响应的程序,其中所有的计算和数据生成在其他线程中完成,UI始终是响应的。我们可能不会在一个开发周期中到达:)






编辑:我以为我应该添加一些关于该应用的更多细节:




  • 这是一个用于Windows的32位桌面应用程序。每份副本均获得许可。我们计划保留一个桌面设备,本地运行的应用程序

  • 我们使用 Embarcadero(以前的Borland)C ++ Builder 2010 进行开发。这会影响我们可以使用的并行库,因为大多数似乎(?)只能用于GCC或MSVC。幸运的是,他们正在积极开发它,它的C ++标准支持比以前好多了。编译器支持这些Boost组件

  • 它的架构并不像应用程序那么干净,而组件通常太紧密。这是另一个问题:)



编辑#2:感谢您的回复! >


  • 我很惊讶,很多人都推荐了一个多进程架构(这是目前最受欢迎的答案),而不是多线程。我的印象是,这是一个非常Unix-ish的程序结构,我不知道它是如何设计或工作的。在Windows上有没有可用的资源?在Windows上真的很常见吗?

  • 在某些多线程建议的具体方法方面,是否有用于异步请求和数据消耗,线程或异步MVP系统的设计模式,或者如何设计一个面向任务的系统,或者是文章和书籍以及释放后的解构,说明工作中的东西和不起作用的东西?当然,我们可以自己开发所有的这个架构,但是从别人以前做过的工作是很好的,并且知道要避免什么错误和陷阱。

  • 一个不涉及的方面任何答案都是项目管理的。我的印象是估计要做多长时间,并且在做某些不确定的事情时可能很难控制项目。这是我在食谱或实用编码建议之后的一个原因,我猜,尽可能地指导和限制编码方向。



我还没有为这个问题提出一个答案 - 这不是因为答案的质量,这是很好的(谢谢你),只是因为这个范围我希望有更多的答案或讨论。谢谢那些已经回复的人了!

解决方案

所以,关于如何继续操作的算法描述有一个提示:


通常是一个复杂的数据流 - 将其视为流经复杂图形的数据,其中每个节点执行操作


我会研究将数据流图形化为完成工作的结构。图中的链接可以是线程安全的队列,每个节点上的算法可以保持不变,除了包含在从队列中拾取工作项并将结果存储在一个的线程中。您可以进一步,使用套接字和进程,而不是队列和线程;这样就可以在多台机器上传播,如果这样做有好处。



然后,您的绘画和其他GUI方法需要拆分为两个:一半排队工作,另一半在绘制或使用结果从管道出来时。



如果应用假定数据是全局的,这可能不实用。但是,如果它在课堂上很好地包含,根据你的描述可能是这样,那么这可能是让它并行化的最简单的方法。


My company's main product is a large monolithic C++ application, used for scientific data processing and visualisation. Its codebase goes back maybe 12 or 13 years, and while we have put work into upgrading and maintaining it (use of STL and Boost - when I joined most containers were custom, for example - fully upgraded to Unicode and the 2010 VCL, etc) there's one remaining, very significant problem: it's fully singlethreaded. Given it's a data processing and visualisation program, this is becoming more and more of a handicap.

I'm both a developer and the project manager for the next release where we want to tackle this, and this is going to be a difficult job in both areas. I'm seeking concrete, practical, and architectural advice on how to tackle the problem.

The program's data flow might go something like this:

  • a window needs to draw data
  • In the paint method, it will call a GetData method, often hundreds of times for hundreds of bits of data in one paint operation
  • This will go and calculate or read from file or whatever else is required (often quite a complex data flow - think of this as data flowing through a complex graph, each node of which performs operations)

Ie, the paint message handler will block while processing is done, and if the data hasn't already been calculated and cached, this can be a long time. Sometimes this is minutes. Similar paths occur for other parts of the program that perform lengthy processing operations - the program is unresponsive for the entire time, sometimes hours.

I'm seeking advice on how to approach changing this. Practical ideas. Perhaps things like:

  • design patterns for asynchronously requesting data?
  • storing large collections of objects such that threads can read and write safely?
  • handling invalidation of data sets while something is trying to read it?
  • are there patterns and techniques for this sort of problem?
  • what should I be asking that I haven't thought of?

I haven't done any multithreaded programming since my Uni days a few years ago, and I think the rest of my team is in a similar position. What I knew was academic, not practical, and is nowhere near enough to have confidence approaching this.

The ultimate objective is to have a fully responsive program, where all calculations and data generation is done in other threads and the UI is always responsive. We might not get there in a single development cycle :)


Edit: I thought I should add a couple more details about the app:

  • It's a 32-bit desktop application for Windows. Each copy is licensed. We plan to keep it a desktop, locally-running app
  • We use Embarcadero (formerly Borland) C++ Builder 2010 for development. This affects the parallel libraries we can use, since most seem (?) to be written for GCC or MSVC only. Luckily they're actively developing it and its C++ standards support is much better than it used to be. The compiler supports these Boost components.
  • Its architecture is not as clean as it should be and components are often too tightly coupled. This is another problem :)

Edit #2: Thanks for the replies so far!

  • I'm surprised so many people have recommended a multi-process architecture (it's the top-voted answer at the moment), not multithreading. My impression is that's a very Unix-ish program structure, and I don't know anything about how it's designed or works. Are there good resources available about it, on Windows? Is it really that common on Windows?
  • In terms of concrete approaches to some of the multithreading suggestions, are there design patterns for asynchronous request and consuming of data, or threadaware or asynchronous MVP systems, or how to design a task-oriented system, or articles and books and post-release deconstructions illustrating things that work and things that don't work? We can develop all this architecture ourselves, of course, but it's good to work from what others have done before and know what mistakes and pitfalls to avoid.
  • One aspect that isn't touched on in any answers is project managing this. My impression is estimating how long this will take and keeping good control of the project when doing something as uncertain as this may be hard. That's one reason I'm after recipes or practical coding advice, I guess, to guide and restrict coding direction as much as possible.

I haven't yet marked an answer for this question - this is not because of the quality of the answers, which is great (and thankyou) but simply that because of the scope of this I'm hoping for more answers or discussion. Thankyou to those who have already replied!

解决方案

So, there's a hint in your description of the algorithm as to how to proceed:

often quite a complex data flow - think of this as data flowing through a complex graph, each node of which performs operations

I'd look into making that data-flow graph be literally the structure that does the work. The links in the graph can be thread-safe queues, the algorithms at each node can stay pretty much unchanged, except wrapped in a thread that picks up work items from a queue and deposits results on one. You could go a step further and use sockets and processes rather than queues and threads; this will let you spread across multiple machines if there is a performance benefit in doing this.

Then your paint and other GUI methods need split in two: one half to queue the work, and the other half to draw or use the results as they come out of the pipeline.

This may not be practical if the app presumes that data is global. But if it is well contained in classes, as your description suggests it may be, then this could be the simplest way to get it parallelised.

这篇关于将大型单片单线程应用程序转换为多线程架构的建议?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆