负载平衡在亚马逊EC2? [英] Load Balancing in Amazon EC2?

查看:150
本文介绍了负载平衡在亚马逊EC2?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们一直在争取与 HAProxy的现在在亚马逊EC2几天;经验迄今已经很大了,但我们坚持对挤出更多表现出来的软件负载平衡器。我们不完全Linux网络学有专长(我们是一个.NET店,通常),但到目前为止,我们已经举行我们自己的,试图设置适当的ulimits,检查内核消息和tcpdumps任何违规行为。 到目前为止,虽然,我们已经达到了高原约1700请求/秒,此时客户端超时比比皆是(我们一直在使用和调整的的httperf 用于此目的)。一个同事和我正在收听到最新的堆栈溢出播客,其中reddit的创始人注意,他们的整个网站运行断其一HAProxy的节点,它到目前为止还没有成为瓶颈。确认!无论是有某种程度上没有看到很多的并发请求,我们正在做一些可怕的错误,或EC2的共享性质限制了EC2实例的网络堆栈(我们使用了大量实例类型)。考虑到这一事实,无论乔尔和reddit的创始人同意,网络将很有可能成为限制因素,是有可能这就是我们所看到的限制?

We've been fighting with HAProxy for a few days now in Amazon EC2; the experience has so far been great, but we're stuck on squeezing more performance out of the software load balancer. We're not exactly Linux networking whizzes (we're a .NET shop, normally), but we've so far held our own, attempting to set proper ulimits, inspecting kernel messages and tcpdumps for any irregularities. So far though, we've reached a plateau of about 1,700 requests/sec, at which point client timeouts abound (we've been using and tweaking httperf for this purpose). A coworker and I were listening to the most recent Stack Overflow podcast, in which the Reddit founders note that their entire site runs off one HAProxy node, and that it so far hasn't become a bottleneck. Ack! Either there's somehow not seeing that many concurrent requests, we're doing something horribly wrong, or the shared nature of EC2 is limiting the network stack of the Ec2 instance (we're using a large instance type). Considering the fact that both Joel and the Reddit founders agree that network will likely be the limiting factor, is it possible that's the limitation we're seeing?

任何想法是极大的AP preciated!

Any thoughts are greatly appreciated!

修改它看起来像实际的问题是不是,事实上,与负载平衡器节点!罪魁祸首竟是运行的httperf,在这种情况下的节点。由于的httperf构建并断开插座为每个请求,它花费在内核的CPU时间量好。当我们碰到的请求率越高,TCP FIN TTL(为60秒默认情况下)被保持得太长,插座,以及ip_local_port_range的默认太低了这种场景。基本上,在客户端(的httperf)节点不断地创建和销毁新的套接字的数分钟后,未使用的端口的数目跑出,和随后的请求差错出在该阶段,得到低请求/秒的数字和大量的错误。

Edit It looks like the actual issue was not, in fact, with the load balancer node! The culprit was actually the nodes running httperf, in this instance. As httperf builds and tears down a socket for each request, it spends a good amount of CPU time in the kernel. As we bumped the request rate higher, the TCP FIN TTL (being 60s by default) was keeping sockets around too long, and the ip_local_port_range's default was too low for this usage scenario. Basically, after a few minutes of the client (httperf) node constantly creating and destroying new sockets, the number of unused ports ran out, and subsequent 'requests' errored-out at this stage, yielding low request/sec numbers and a large amount of errors.

我们也期待已久的nginx的,但我们一直与RighScale,他们已经得到了下降,为HAProxy的脚本。 [当然]哦,我们有过紧的最后期限转出部件,除非它被证明绝对必要的。幸运的是,正对AWS允许我们使用nginx的并行(如有必要)隔夜后来测试出另一个的设置,并进行切换。

We also had looked at nginx, but We've been working with RighScale, and they've got drop-in scripts for HAProxy. Oh, and we've got too tight a deadline [of course] to switch out components unless it proves absolutely necessary. Mercifully, being on AWS allows us to test out another setup using nginx in parallel (if warranted), and make the switch overnight later on.

本页面描述了每个sysctl的变量相当好(ip_local_port_range和tcp_fin_timeout进行调谐,在这种情况下)。

This page describes each of the sysctl variables fairly well (ip_local_port_range and tcp_fin_timeout were tuned, in this case).

推荐答案

不是一个真正的回答你的问题,而是nginx的和英镑均具有良好的声誉作为负载均衡。字preSS只需切换到nginx的,提供了良好的效果。

Not really an answer to your question, but nginx and pound both have good reputations as load-balancers. Wordpress just switched to nginx with good results.

但更具体地说,调试你的问题。如果您没有看到100%的CPU使用率(包括I / O等待),那么你的网络的约束,是的。 EC2在内部使用千兆网络,尝试使用XL实例,所以你必须底层硬件的给自己,而不必共享千兆网络端口。

But more specifically, to debug your problem. If you aren't seeing 100% cpu usage (including I/O wait), then you are network bound, yes. EC2 internally uses a gigabit network, try using an XL instance, so you have the underlying hardware to yourself, and don't have to share that gigabit network port.

这篇关于负载平衡在亚马逊EC2?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆