在EC2上以几乎零停机时间部署(单个节点)Django Web应用程序 [英] Deploying a (single node) Django Web application with virtually zero downtime on EC2

查看:86
本文介绍了在EC2上以几乎零停机时间部署(单个节点)Django Web应用程序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题:使用Django时实现0(或尽可能接近0)停机​​时间的最佳策略是什么?



大多数我读过的答案是使用南方或使用面料,但是恕我直言,这是非常模糊的答案。我实际上同时使用了两者,并且仍然想知道如何尽可能地实现零停机。



一些详细信息:



我在EC2上托管了一个尺寸合适的Django应用程序。我将 South用于模式和数据迁移以及 fabric boto 来自动执行重复的部署/备份任务,这些任务通过一组 Jenkins(连续集成服务器)任务。我使用的数据库是标准的PostgreSQL 9.0实例。



我有一个...


  1. 登台服务器,该登台服务器会由我们的团队不断进行修改,其中包含所有新内容,并加载有最新和最出色的代码和一个...


  2. 实时服务器,该服务器随用户帐户和用户数据而不断变化-全部记录在PostgreSQL中。


当前部署策略:



在部署新代码和内容时,创建两个服务器的两个EC2快照(实时和暂存)。实时切换到更新新内容页面...



停机时间开始。



将实时克隆服务器迁移到与登台服务器相同的架构版本(使用south)。仅创建我想从实时保留的表和序列的转储(特别是用户帐户及其数据)。完成此操作后,转储将上传到分段克隆服务器。实时保留的表将被截断并插入数据。 随着实时服务器中数据的增长,这次显然还在增加



一旦负载完成,实时服务器的弹性ip就会得到更改为暂存克隆(因此已被提升为新版本)。实时实例和实时克隆实例将终止。



停机时间结束



<是,这是可行的,但是随着数据的增长,我的虚拟零停机时间越来越远。当然,我想到的事情是以某种方式利用复制并开始研究PostgreSQL复制和最终一致的方法。我知道负载均衡器可以解决一些魔术问题,但是与此同时创建的帐户问题使这个问题变得棘手。



您会建议我看什么?



更新



我有一个典型的Django单节点应用程序。我希望能找到一个针对django特有问题的解决方案。例如,将Django对多个数据库的支持与自定义路由器一起用于复制的想法我已经跨过了。有一些与我希望答案有关的问题。

解决方案

一种可能感兴趣的技术是称为金丝雀发布。去年在阿姆斯特丹的一次软件会议上,我看到了Jez Humble的精彩演讲;关于低风险释放,请在此处



想法不是立即切换所有系统,而是将一小部分用户发送到新版本。只有当新系统的所有性能指标都达到预期时,其他指标也才能切换。我知道Facebook等大型网站也使用这种技术。


Question: What are good strategies for achieving 0 (or as close as possible to 0) downtime when using Django?

Most of the answer I read say "use south" or "use fabric", but those are very vague answer IMHO. I actually use both, and am still wondering how to achieve zero downtime as much as possible.

Some details:

I have a decently sized Django application that I host at EC2. I use South for schema and data migrations as well as fabric with boto for automating repetitive deployment/backup tasks that get triggered through a set of Jenkins (continuous integration server) tasks. The database I use is a standard PostgreSQL 9.0 instance.

I have a...

  1. staging server that gets constantly edited by our team with all the new content and gets loaded with latest and greatest code and a...

  2. live server that keeps changing with user accounts and user data - all recorded in PostgreSQL.

Current deployment strategy:

When deploying new code and content, two EC2 snapshots of both servers (live and staging) are created. The live is switched to an "Updating new content" page...

Downtime begins.

The live-clone server gets migrated to the same schema version as staging server (using south). A dump of only the tables and sequences that I want preserved from live gets created (particularly, the user accounts along with their data). Once this is done, the dump gets uploaded to the staging-clone server. The tables that were preserved from live are truncated and the data gets inserted. As the data in my live server grows, this time obviously keeps increasing.

Once the load is complete the elastic ips of the live server gets changed to the staging-clone (and thus it has been promoted to be the new live). The live instance and the live-clone instance get terminated.

Downtime ends.

Yes this works, but as data grows, my "virtual" zero downtime gets further and further away. Of course, something that has crossed my mind is to somehow leverage replication and to start looking into PostgreSQL replication and "eventually consistent" approaches. I know there is some magic I could do perhaps with load balancers, but the issue of accounts created in the meantime make it tricky.

What would you recommend I look at?

Update:

I have a typical Django single node application. I was hoping for a solution that would go more in depth with django specific issues. For example, the idea of using Django's support for multiple databases with custom routers alongside replication has crossed my mind. There are issues related to that which I hope answer would touch upon.

解决方案

What might be interested to look at is a technique called Canary Releasing. I saw a great presentation of Jez Humble last year at a software conference in Amsterdam; it was about low risk releases, the slides are here.

The idea is to not switch all systems at once, but to send a small set of users to the new version. Only when all performance metrics of the new systems are like expected, the others are switched over as well. I know that this technique is also used by big sites like facebook.

这篇关于在EC2上以几乎零停机时间部署(单个节点)Django Web应用程序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆