并行冗余计算 - 这有意义吗? [英] Parallel redundant calculations - does this make sense ?

查看:182
本文介绍了并行冗余计算 - 这有意义吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一个月前,我的脑海中浮现出一种想法,我仍在思考它。一方面,它可能看起来像一个笑话,另一方面,它可能有一个非常严重的背景。我决定与社区分享并征求他们的意见。



一旦我在平台上遇到影响应用程序持久性的内存故障。我想:

- RAID是一项众所周知的技术。它旨在使用独立(或廉价)磁盘提高数据存储的耐用性。

- CPU和RAM现在更便宜和并行化(更独立);

- 我们可以引入名为RPTC(冗余并行线程计算)的技术,其中相同的计算将并行执行,如RAID存储数据吗?



因此,如果进行独立计算产生相同的结果意味着该过程已经执行而没有错误。如果结果不同 - 发生了错误。作为一个优点 - 更耐用的计算。



考虑这种技术的严重原因是硬件RAM错误和计算量增加的可能性。众所周知,RAM芯片会产生故障(你可以在互联网上找到信息)。



应用领域:核电站,医药,空间, - 任何相关领域。



这有意义吗?



我的尝试:



一旦应用程序开始报告内存碎片错误(操作系统或硬件级别)。应用程序在虚拟机中工作。只有在重新启动guest和主机后问题才会消失。我是程序员,但不是系统工程师。但似乎虚拟化增加了平台错误的可能性。

解决方案

安全关键系统的更好方法是真正的冗余:冗余(工程) - 维基百科,免费的百科全书 [ ^ ]

硬件重复,软件由不同的团队编写,三分之二的人可以投票第三。纯粹在同一处理器上的不同线程中复制相同的计算并不能为您提供太多安全性 - 您可以想到平均年份中获得了多少软错误,并将其与您在同一时间遇到的错误数量进行比较。 ..:笑:

这是一个想法,但我不相信你所采取的性能会以任何方式平衡你所获得的小优势。


A thought flashed in my mind about a month ago and I am still thinking about it. On one hand, it may seem like a joke, on the other hand, it can have a very serious background. I decided to share it with the community and ask their opinion.

Once I faced a memory fault in the platform that affected the application durability. And I thought:
- RAID is a well-known technology. It has been designed to increase durability of data storage using independent (or inexpensive) disks.;
- CPU and RAM are cheaper and parallelized (more independent) now;
- Can we introduce technology named say as RPTC (Redundant Parallel Thread Calculations) where the same calculations will be performed independently in parallel like RAID stores data ?

So, if independent calculations produce the same result it will mean that the process has been performed without error. If results differ – the error has occurred. As an advantage – more durable calculations.

The serious reason for considering this technique is likelihood of hardware RAM errors and increased volume of calculations. As it is known RAM chips can produce failure (you can find information in Internet).

Areas of application: nuclear plants, medicine, space, - any relevant areas.

Does this make sense ?

What I have tried:

Once the application started to report memory fragmentation errors (OS or Hardware level). Application worked in virtual machine. Only after rebooting guest and host the problem disappears. I an a programmer, but not a system engineer. But it seems that virtualization increases likelihood of platform error.

解决方案

A better method for safety critical systems is true redundancy: Redundancy (engineering) - Wikipedia, the free encyclopedia[^]
The hardware is duplicated, the software is written by different teams, and two out of three can "out-vote" the third. Purely duplicating the same calculation in different threads on the same processor doesn't provide you much security - it you think how many "soft errors" you get in the average year and compare that to the number of bugs you encounter in the same time... :laugh:
It's an idea, but I'm not convinced that the performance hit you would take would in any way balance the small advantage you would get.


这篇关于并行冗余计算 - 这有意义吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆