是否不需要使用带有 ElasticSearch 的负载均衡器? [英] Is using a load balancer with ElasticSearch unnecessary?

查看:27
本文介绍了是否不需要使用带有 ElasticSearch 的负载均衡器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个在 AWS EC2 上运行的由 3 个 ElasticSearch 节点组成的集群.这些节点是使用 OpsWorks/Chef 设置的.我的意图是将这个集群设计成非常有弹性和弹性(节点可以在需要时进出).

I have a cluster of 3 ElasticSearch nodes running on AWS EC2. These nodes are setup using OpsWorks/Chef. My intent is to design this cluster to be very resilient and elastic (nodes can come in and out when needed).

从我读到的关于 ElasticSearch 的所有内容来看,似乎没有人建议在集群前面放置负载均衡器;相反,似乎建议是做以下两件事之一:

From everything I've read about ElasticSearch, it seems like no one recommends putting a load balancer in front of the cluster; instead, it seems like the recommendation is to do one of two things:

  1. 将你的客户端指向一个节点的URL/IP,让ES为你做负载均衡,希望这个节点永远不会宕机.

  1. Point your client at the URL/IP of one node, let ES do the load balancing for you and hope that node never goes down.

将所有节点的 URL/IP 硬编码到客户端应用中,并让应用处理故障转移逻辑.

Hard-code the URLs/IPs of ALL your nodes into your client app and have the app handle the failover logic.

我的背景主要是在网络农场中创建一个巨大的自治网络服务器池,在它们前面放置一个 ELB 并让负载平衡器决定哪些节点是活动的还是死的,这只是常识.为什么 ES 似乎不支持同样的架构?

My background is mostly in web farms where it's just common sense to create a huge pool of autonomous web servers, throw an ELB in front of them and let the load balancer decide what nodes are alive or dead. Why does ES not seem to support this same architecture?

推荐答案

您不需要负载均衡器 - ES 已经提供了该功能.您只需要另一个组件,它可能会出现异常行为并且会添加不必要的网络跃点.

You don't need a load balancer — ES is already providing that functionality. You'd just another component, which could misbehave and which would add an unnecessary network hop.

ES 会将您的数据分片(默认为 5 个分片),它会尝试在您的实例之间平均分配.在您的情况下,2 个实例应该有 2 个分片,而 1 个只有一个,但您可能希望将分片更改为 6 个以实现均等分布.

ES will shard your data (by default into 5 shards), which it will try to evenly distribute among your instances. In your case 2 instances should have 2 shards and 1 just one, but you might want to change the shards to 6 for an equal distribution.

默认情况下,复制设置为 "number_of_replicas":1,因此每个分片都有一个副本.假设您使用 6 个分片,它可能看起来像这样(R 是一个复制分片):

By default replication is set to "number_of_replicas":1, so one replica of each shard. Assuming you are using 6 shards, it could look something like this (R is a replicated shard):

  • 节点0:1、4、R3、R6
  • 节点1:2、6、R1、R5
  • 节点2:3、5、R2、R4

假设 node1 死亡,集群将更改为以下设置:

Assuming node1 dies, the cluster would change to the following setup:

  • node0:1、4、6、R3 + 新副本 R5、R2
  • node2:3、5、2、R4 + 新副本 R1、R6

根据您的连接设置,您可以连接到一个实例(传输客户端),也可以加入集群(节点客户端).使用节点客户端,您将避免双跳,因为您将始终连接到正确的分片/索引.使用传输客户端,您的请求将被路由到正确的实例.

Depending on your connection setting, you can either connect to one instance (transport client) or you could join the cluster (node client). With the node client you'll avoid double hops, since you'll always connect to the correct shard / index. With the transport client, your requests will be routed to the correct instance.

因此,您无需为自己进行负载平衡,您只会增加开销.自动聚类可能是 ES 最大的优势.

So there's nothing to load balance for yourself, you'd just add overhead. The auto-clustering is probably ES's greatest strength.

这篇关于是否不需要使用带有 ElasticSearch 的负载均衡器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆