多节点DSE集群的容错能力和拓扑透明性 [英] Fault tolerance and topology transparency of multi-node DSE Cluster

查看:327
本文介绍了多节点DSE集群的容错能力和拓扑透明性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我具有以下DSE群集设置:



DC Cassandra




  • Cassandra节点1



DC Solr




  • Solr节点1

  • Solr节点2

  • Solr节点3

  • Solr节点4



每个DC的复制因子为1



我的问题:


  1. 要执行搜索,我将Solr SELECT查询发送到特定节点。这引入了单点故障。如果节点关闭,则查询失败。有没有一种方法可以查询群集/ DC而不是查询特定节点?

  2. 为了使结果集完整,我需要通过手动指定其他节点碎片参数。这是由于预期的行为还是我配置错误?我的期望是这应该是自动的。我不需要每次将节点添加到群集中时都必须编辑应用程序的源代码

  3. 接下来是问题#1和2,如果有其他节点(除了在大多数情况下,我发送Solr查询的特定节点已关闭),出现诸如范围的碎片不可用...或服务器连接被拒绝...之类的错误。再次,这会破坏容错能力。是否有可能使集群返回部分结果?(即仅来自可用节点的数据)

总体而言,我的目标是:


  1. 使应用程序尽可能地容错-如果任何节点关闭,应用程序仍应显示其余节点的部分结果

  2. 使底层DSE拓扑对应用程序透明。每次添加或删除节点时,我都不需要编辑应用程序的源代码/配置


解决方案

关于您的特定问题:



1)退回到另一个如果请求的客户端不可用,则通常类似于客户端负载平衡,即通常由客户端实现:我们依赖于标准的Cassandra和Solr客户端,因此您必须在它们的基础上进行构建。



2)不,您一定不要使用 shards参数:只需将查询发送到任何DSE Solr节点,它就会透明地分布。

3)发生无法使用的碎片错误是因为分布式搜索查询需要联系所有令牌范围以提供正确的答案。通常的解决方案是增加复制因子,以便能够容忍RF-1故障。我们目前不支持部分结果,但是我们可能会在将来的版本中支持。



总体而言,DSE Solr完全透明且高度可用,只要您设置适当的复制因子容纳您要容忍的失败次数。


I have the following DSE cluster setup:

DC Cassandra

  • Cassandra node 1

DC Solr

  • Solr node 1
  • Solr node 2
  • Solr node 3
  • Solr node 4

The replication factor is 1 for each DC

My questions:

  1. To perform a search, I send a Solr SELECT query to a specific node. This introduces a single point of failure. If the node is down, the query fails. Is there a way to "query the cluster/DC" instead of querying a specific node?
  2. In order for the result-set to be complete, I need to manually specify the other nodes via the 'shards' parameter. Is this by the expected behavior or have I misconfigured something? My expectation is that this should have been automatic. I don't want to have to edit my app's source code every time I add a node to the cluster
  3. Following up from question #1 and 2, if any other node (aside from the specific node where I send the Solr query) is down, most of the time, I get an error like 'Unavailable shards for ranges..." or "Server connection refused at...". Again, this breaks fault tolerance. Is it possible to make the cluster return partial results? (i.e. only data from the available nodes)

Overall, my goals are:

  1. Make the app as fault-tolerant as possible - if any of the nodes are down, the app should still display partial results from the remaining nodes
  2. Make the underlying DSE topology transparent to the app. I should not need to edit the app's source code / config every time a node is added or removed

解决方案

About your specific questions:

1) Falling back to another server in case the requested one is unavailable is something akin to client load balancing, that is, usually implemented by the client side: we rely on standard Cassandra and Solr clients, so you have to build on them.

2) No, you must not use the "shards" parameter: just send your query to any of the DSE Solr nodes, and it will be transparently distributed.

3) The "Unavailable shards" error happens because the distributed search query needs to contact all token ranges to provide a correct answer. The usual solution is to increase the replication factor in order to be able to tolerate RF-1 failures; we don't currently support partial results, but we may do in future versions.

Overall, DSE Solr is completely transparent and highly available, provided you setup a proper replication factor to accommodate the number of failures you want to tolerate.

这篇关于多节点DSE集群的容错能力和拓扑透明性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆