将大型单体单线程应用程序转换为多线程架构的建议? [英] Advice for converting a large monolithic singlethreaded application to a multithreaded architecture?

查看:29
本文介绍了将大型单体单线程应用程序转换为多线程架构的建议?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我公司的主要产品是大型单体 C++ 应用程序,用于科学数据处理和可视化.它的代码库可以追溯到 12 或 13 年,虽然我们已经投入了升级和维护工作(使用 STL 和 Boost - 例如,当我加入大多数容器时,大多数容器都是自定义的 - 完全升级到 Unicode 和 2010 VCL 等)还有一个非常重要的问题:它是完全单线程的.鉴于它是一个数据处理和可视化程序,这越来越成为一个障碍.

My company's main product is a large monolithic C++ application, used for scientific data processing and visualisation. Its codebase goes back maybe 12 or 13 years, and while we have put work into upgrading and maintaining it (use of STL and Boost - when I joined most containers were custom, for example - fully upgraded to Unicode and the 2010 VCL, etc) there's one remaining, very significant problem: it's fully singlethreaded. Given it's a data processing and visualisation program, this is becoming more and more of a handicap.

我是下一个版本的开发人员项目经理,我们要解决这个问题,这在两个领域都将是一项艰巨的工作.我正在寻求关于如何解决问题的具体、实用和架构建议.

I'm both a developer and the project manager for the next release where we want to tackle this, and this is going to be a difficult job in both areas. I'm seeking concrete, practical, and architectural advice on how to tackle the problem.

程序的数据流可能是这样的:

The program's data flow might go something like this:

  • 一个窗口需要绘制数据
  • 在paint方法中,它会调用一个GetData方法,在一次绘制操作中,对于数百位数据,通常会调用数百次
  • 这将去计算或读取文件或其他任何需要的东西(通常是一个相当复杂的数据流 - 将其视为流经复杂图形的数据,其中的每个节点执行操作)

即,绘制消息处理程序将在处理完成时阻塞,如果数据尚未计算和缓存,这可能会很长一段时间.有时这是几分钟.程序中执行冗长处理操作的其他部分也会出现类似路径 - 程序在整个时间(有时是数小时)内都没有响应.

Ie, the paint message handler will block while processing is done, and if the data hasn't already been calculated and cached, this can be a long time. Sometimes this is minutes. Similar paths occur for other parts of the program that perform lengthy processing operations - the program is unresponsive for the entire time, sometimes hours.

我正在寻求有关如何改变这种情况的建议.实用的想法.也许是这样的:

I'm seeking advice on how to approach changing this. Practical ideas. Perhaps things like:

  • 异步请求数据的设计模式?
  • 存储大量对象以便线程可以安全地读写?
  • 在尝试读取数据时处理数据集失效?
  • 是否有解决此类问题的模式和技术?
  • 我应该问哪些我没有想到的问题?

自从几年前进入大学时代以来,我就没有做过任何多线程编程,我认为我团队的其他成员也处于类似的境地.我所知道的是学术性的,而不是实用的,而且还远远不足以让我有信心解决这个问题.

I haven't done any multithreaded programming since my Uni days a few years ago, and I think the rest of my team is in a similar position. What I knew was academic, not practical, and is nowhere near enough to have confidence approaching this.

最终目标是拥有一个完全响应的程序,其中所有计算和数据生成都在其他线程中完成,并且 UI 始终响应.我们可能无法在一个开发周期内达到目标 :)

The ultimate objective is to have a fully responsive program, where all calculations and data generation is done in other threads and the UI is always responsive. We might not get there in a single development cycle :)

我想我应该添加一些关于应用程序的更多细节:

I thought I should add a couple more details about the app:

  • 它是适用于 Windows 的 32 位桌面应用程序.每个副本都获得许可.我们计划将其保留为在本地运行的桌面应用
  • 我们使用 Embarcadero(以前称为 Borland)C++ Builder 2010 进行开发.这会影响我们可以使用的并行库,因为大多数似乎 (?) 只为 GCC 或 MSVC 编写.幸运的是,他们正在积极开发它,并且它的 C++ 标准支持比以前好得多.编译器支持这些 Boost 组件.
  • 它的架构并不像应有的那样干净,而且组件通常过于紧密耦合.这是另一个问题:)
  • It's a 32-bit desktop application for Windows. Each copy is licensed. We plan to keep it a desktop, locally-running app
  • We use Embarcadero (formerly Borland) C++ Builder 2010 for development. This affects the parallel libraries we can use, since most seem (?) to be written for GCC or MSVC only. Luckily they're actively developing it and its C++ standards support is much better than it used to be. The compiler supports these Boost components.
  • Its architecture is not as clean as it should be and components are often too tightly coupled. This is another problem :)

编辑 #2:感谢到目前为止的回复!

Edit #2: Thanks for the replies so far!

  • 我很惊讶有这么多人推荐多进程架构(这是目前投票最高的答案),而不是多线程.我的印象是这是一个非常 Unix 风格的程序结构,我对它的设计或工作方式一无所知.在 Windows 上有关于它的好的资源吗?在 Windows 上真的那么常见吗?
  • 就一些多线程建议的具体方法而言,是否有异步请求和数据消费的设计模式,或线程感知或异步 MVP 系统,或如何设计面向任务的系统,或文章和书籍和帖子- 释放解构说明有效的事物和无效的事物?当然,我们可以自己开发所有这些架构,但最好根据其他人之前所做的工作并了解要避免哪些错误和陷阱.
  • 任何答案中都没有涉及的一个方面是项目管理.我的印象是估计这需要多长时间,并在做一些不确定的事情时保持对项目的良好控制,这可能很困难.我想这就是我追求食谱或实用编码建议的原因之一,我想是为了尽可能地指导和限制编码方向.

我还没有标记这个问题的答案 - 这不是因为答案的质量,这很好(谢谢),只是因为这个问题的范围,我希望有更多的答案或讨论.感谢已经回复的人!

I haven't yet marked an answer for this question - this is not because of the quality of the answers, which is great (and thankyou) but simply that because of the scope of this I'm hoping for more answers or discussion. Thankyou to those who have already replied!

推荐答案

因此,您对算法的描述中有一个关于如何进行的提示:

So, there's a hint in your description of the algorithm as to how to proceed:

通常是相当复杂的数据流 - 将其视为流经复杂图形的数据,其中每个节点执行操作

often quite a complex data flow - think of this as data flowing through a complex graph, each node of which performs operations

我会考虑让数据流图真正成为完成工作的结构.图中的链接可以是线程安全的队列,每个节点的算法几乎可以保持不变,除了包裹在一个线程中,该线程从队列中提取工作项并将结果存入一个队列.你可以更进一步,使用套接字和进程而不是队列和线程;如果这样做有性能优势,这将使您分布在多台机器上.

I'd look into making that data-flow graph be literally the structure that does the work. The links in the graph can be thread-safe queues, the algorithms at each node can stay pretty much unchanged, except wrapped in a thread that picks up work items from a queue and deposits results on one. You could go a step further and use sockets and processes rather than queues and threads; this will let you spread across multiple machines if there is a performance benefit in doing this.

然后您的绘制和其他 GUI 方法需要分为两部分:一半用于将工作排入队列,另一半用于绘制或使用从管道中出来的结果.

Then your paint and other GUI methods need split in two: one half to queue the work, and the other half to draw or use the results as they come out of the pipeline.

如果应用假定数据是全局的,这可能不切实际.但是,如果它很好地包含在类中,正如您的描述所暗示的那样,那么这可能是使其并行化的最简单方法.

This may not be practical if the app presumes that data is global. But if it is well contained in classes, as your description suggests it may be, then this could be the simplest way to get it parallelised.

这篇关于将大型单体单线程应用程序转换为多线程架构的建议?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆