Solr 4将分片添加到现有集群 [英] Solr 4 Adding Shard to existing Cluster

查看:174
本文介绍了Solr 4将分片添加到现有集群的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

背景:我刚刚阅读完 Apache Solr 4本食谱.作者在其中提到需要明智地设置分片,因为不能将新的分片添加到现有集群中.但是,这是使用Solr 4.0编写的,目前我使用的是4.1.还是这样吗?我希望我没有找到这个问题,我希望有人能告诉我否则我.

Background: I just finished reading the Apache Solr 4 Cookbook. In it the author mentions that setting up shards needs to be done wisely b/c new ones cannot be added to an existing cluster. However, this was written using Solr 4.0 and at the present I am using 4.1. Is this still the case? I wish I hadn't found this issue and I'm hoping someone can tell me otherwise.

问题:在SolrCloud集群中设置分片时,我是否希望知道将来会存储多少数据? 我玩过 Solandra ,并在弹性搜索,但老实说,我确实是Solr(及其庞大社区!)的粉丝.我也喜欢Zookeeper.我现在是否卡住了,还是有解决方法/补丁?

Question: Am I expected to know how much data I'll store in the future when setting up shards in a SolrCloud cluster? I have played with Solandra and read up on elastic search, but quite honestly I am a fan of Solr as it is (and its large community!). I also like Zookeeper. Am I stuck for now or is there a workaround/patch?

如果上面的问题为否",我是否可以用一堆(可能是100个或更多)分片构建SolrCloud,并让它们(内部)生长,同时我增长我的数据开始将它们一一剥离,然后将它们放入具有更多资源的更大,速度更快的服务器中吗?

If Question above is NO, could I build a SolrCloud with a bunch (maybe 100 or more) shards and let them grow (internally) and while I grow my data start peeling them off one by one and put them into larger, faster servers with more resources?

推荐答案

是的,当然可以.您必须设置一个指向相同Zookeeper实例的新Solr服务器.在引导过程中,服务器连接到zk ensemble并将其自身注册为集群成员.

Yes, of course you can. You have to setup a new Solr server pointing to the same zookeeper instance. During the bootstrap the server connects to zk ensemble and registers itself as a cluster member.

注册过程完成后,服务器即可创建新的内核.您可以使用 CoreAdmin 创建现有分片的副本.另外,您可以创建新的分片,但是由于Lucene索引格式(并非存储了所有字段),它们将无法平衡,因为它可能没有所有文档信息来重新平衡集群,因此只有新的索引/更新的文档才能到达该服务器(不建议这样做).

Once the registration process is complete, the server is ready to create new cores. You can create replicas of the existing shards using CoreAdmin. Also you can create new shards, but they won't be balanced due to Lucene index format (not all fields are stored), because it may not have all document information to rebalance the cluster, so only new indexed/updated documents will get to this server (doing this is not recommendable).

设置SolrCloud时,必须考虑文档数量增长因素来创建集群,因此,如果最初有100万个文档,并且每天增加10k docs/s,请使用5个分片来设置集群,因此在开始时您必须在两台计算机的初始设置中托管此分片,但是将来,根据需要,您可以将新服务器添加到集群中,并

When you setup your SolrCloud you have to create the cluster taking into account your document number growth factor, so if you have 1M documents at first and it grows as 10k docs/day, setup the cluster with 5 shards, so at start you have to host this shards in your two machines initial setup, but in the future, as needed, you can add new servers to the cluster and move those shards to this new servers. Be careful to not overgrow you cluster because, in Lucene, a single 20Gb index split across 5 shards won't be a 4Gb index in every shard. Every shard will take about (single_index_size/num_shards)*1.1 (due to dictionary compression). This may change depending on your term frequency.

您最后一次机会是将新服务器添加到群集中,而不是将新的分片/副本添加到现有服务器中,而是使用新的分片设置一个新的不同集合,并与此新集合并行地重新索引.然后,一旦您的重新索引过程完成,就将这个集合与旧集合交换.

The last chance you have is to add the new servers to the cluster and instead of adding new shards/replicas to the existing server, setup a new different collection using your new shards and reindex in parallel to this new collection. Then, once your reindex process finished, swap this collection and the old one.

这篇关于Solr 4将分片添加到现有集群的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆