可使用具有ElasticSearch不必要的负载均衡? [英] Is using a load balancer with ElasticSearch unnecessary?

查看:832
本文介绍了可使用具有ElasticSearch不必要的负载均衡?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对AWS EC2上运行3 ElasticSearch节点的群集。这些节点使用OpsWorks设置/厨师。我的目的是要设计这个集群是非常有弹性,有弹性(需要时节点可以进来了)。

I have a cluster of 3 ElasticSearch nodes running on AWS EC2. These nodes are setup using OpsWorks/Chef. My intent is to design this cluster to be very resilient and elastic (nodes can come in and out when needed).

这一切我读过关于ElasticSearch,好像没有人建议把一个负载均衡集群的前面;相反,它似乎是建议是做两件事情:

From everything I've read about ElasticSearch, it seems like no one recommends putting a load balancer in front of the cluster; instead, it seems like the recommendation is to do one of two things:

  1. 将您的客户端的URL /一个节点的IP,让ES做负载均衡,为您和希望节点不降。

  1. Point your client at the URL/IP of one node, let ES do the load balancing for you and hope that node never goes down.

硬code中的网址/你的所有节点的IP地址到你的客户端应用程序,并拥有应用程序处理故障转移逻辑。

Hard-code the URLs/IPs of ALL your nodes into your client app and have the app handle the failover logic.

我的背景是大多是在网上的农场,这只是常识创造了巨大的自主Web服务器池,抛出一个ELB在他们面前,让负载均衡决定哪些节点是活着还是死了。为什么似乎并ES不支持此相同的架构?

My background is mostly in web farms where it's just common sense to create a huge pool of autonomous web servers, throw an ELB in front of them and let the load balancer decide what nodes are alive or dead. Why does ES not seem to support this same architecture?

推荐答案

您不需要负载平衡器 - ES已经提供了该功能。你只是一个组成部分,这可能会失礼的行为,这会增加不必要的网络跃点。

You don't need a load balancer — ES is already providing that functionality. You'd just another component, which could misbehave and which would add an unnecessary network hop.

ES将分片数据(默认为5个碎片),它会尝试你的实例间平均分配。在你的情况2情况下应该有2个碎片和1个只有一个,但你可能要改变碎片到6均匀分布。

ES will shard your data (by default into 5 shards), which it will try to evenly distribute among your instances. In your case 2 instances should have 2 shards and 1 just one, but you might want to change the shards to 6 for an equal distribution.

在默认情况下复制设置为number_of_replicas:1 ,所以每个碎片的一个副本。假设你使用的是6碎片,它可能看起来像这样(R是一个复制碎片):

By default replication is set to "number_of_replicas":1, so one replica of each shard. Assuming you are using 6 shards, it could look something like this (R is a replicated shard):

  • NODE0:1,4,R3,R6
  • 节点1:2,6,R1,R5
  • 节点2:3,5,R2,R4

假设节点1死亡,集群会更改为以下设置:

Assuming node1 dies, the cluster would change to the following setup:

  • NODE0:1,4,6,R3 +新副本R5,R2
  • 节点2:3,5,2,R4 +新副本R1,R6

根据您的连接设置,你可以连接到一个实例(传输客户端),或者你可以加入群集(节点客户端)。随着节点的客户端,你会避免重复跳,因为你总是连接到正确的碎片/指数。随着运输的客户,您的请求将被路由到正确的实例。

Depending on your connection setting, you can either connect to one instance (transport client) or you could join the cluster (node client). With the node client you'll avoid double hops, since you'll always connect to the correct shard / index. With the transport client, your requests will be routed to the correct instance.

所以没有什么负载平衡自己,你只会增加开销。自动聚类可能是ES的最大优势。

So there's nothing to load balance for yourself, you'd just add overhead. The auto-clustering is probably ES's greatest strength.

这篇关于可使用具有ElasticSearch不必要的负载均衡?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆