开发用于“无停机环境”的软件。 [英] Developing software for a "no downtime environment"

查看:74
本文介绍了开发用于“无停机环境”的软件。的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目前在我的组织中,我们在周日早上有一个2小时的停机时间窗口,以便我们可以将最新的代码和SQL部署到生产环境中。前几天我们收到了来自高层的消息,几个月后我们需要100%的上班时间;从而失去了我们2小时的窗口。

Currently in my organization we have a 2 hour downtime window early on Sunday mornings for when we can deploy our latest code and SQL to the production environment. We received word the other day from the higher ups that in a few months we will need to have 100% up time; thus losing our 2 hour window.


显然有很多东西需要24x7的软件和硬件正常运行时间,但我只是好奇是否有人目前必须在这种类型的环境中工作,如果是这样,你怎么做呢?如果必须锁定表,如何在SQL中更改表?当数据库从右下角更改时,你的代码如何响应你的应用程序?

There is obviously a lot that goes into having to have a 24x7 uptime with both software and hardware but I was just curious if anyone currently has to work in this type of environment and if so, how do you do it?  How do you make changes to your tables in SQL if tables have to be locked? How does your code respond in your applications when the database is being changed from right underneith it?

推荐答案

这并不像你想象的那么罕见 - 想象一下赚钱的解决方案,确实需要零停机时间,因为这会导致收入损失。

首先,您显然需要容错,硬件内置冗余,因此适当的负载平衡,集群,生成器, UPS等等。
您需要确定应用中的故障点,并确保有办法自动补偿故障。

所以有类似的事情一个心跳监视器,用于应用程序运行状况,监视CPU%,磁盘空间,带宽利用率等,这是常见的嫌疑人。

您还需要考虑应用程序如何处理错误 - 它是否会一直恢复,并始终以某种方式通知您失败,让您对问题作出反应?

如果发生了什么?电子机器重新启动?该应用是否会重启?负载均衡器会以多快的速度从该服务器重定向流量等等。

假设您的软件确实不具备容错能力并且能够100%正常运行,那么您现在有机会将其脱机机器,并升级它们,将它们带回负载平衡池,并将另一个放下,从而保持正常运行时间。也就是说,通常情况并不容易,如果出现问题,您通常必须计划维护窗口的使用。

整个过程的关键是规划,分析,规划和更多计划。

您是否拥有以100%正常运行时间保证的硬件托管的软件,或者这只是"试图在100%的时间内保持软件正常运行"的情况?实际上,大多数软件都会有一些停机时间,规则的一些例外情况是金融机构,升级是一项重大任务。您需要考虑停机时间的影响,以及维持正常运行时间所需的工作量。 100%的正常运行时间需要花费相当多的金钱,并且努力做得很好。

干杯,

马丁。
This isn't as uncommon as you'd think - imagine any solution that makes money, really needs zero down time, since that leads to lost revenue.

First, you obviously need fault tolerant, redundancy built into the hardware, so proper load balancing, and clustering, generators, UPS and so on.

You need to identify the failure points in the application, and ensure that there is a way to automatically compensate for the failure.

So having things like a heartbeat monitor for the application health, monitoring CPU %, disk space, bandwidth utilisation and so on, the usual suspects.

You would also need to consider how your application deals with errors - will it always recover, and always notify you in some way of the failure, to allow you to react to the problem?

What happens if the machine reboots?  Does the app restart?  How quickly will the load balancer redirect traffic away form that server, and so on.

Assuming that your software really is not fault tolerant and capable of 100% uptime, you now have the opportunity to take offline the machines, and upgrade them, bring them back into the load balanced pool, and take the other down, so maintaining the uptime.  That said, it's not usually that easy, and you often do have to plan for a maintenance window to use, should something go wrong.

The key to the whole process is planning, analysis, planning, and more planning.

Do you have the software hosted on 100% uptime guaranteed hardware, or is this just a case of 'attempt to keep the software up and running for 100% of the time'?  Realistically, most software will have some downtime, the slight exceptions to the rule are financial institutions, where an upgrade is a major undertaking.  You need to consider what the impact of downtime actually has, and the effort involved in maintaining uptime.  100% uptime costs a decent amount of money, and effort to do properly.

Cheers,

Martin.


这篇关于开发用于“无停机环境”的软件。的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆