solr 多核 vs 分片 vs 1 个大集合 [英] solr multicore vs sharding vs 1 big collection

查看:32
本文介绍了solr 多核 vs 分片 vs 1 个大集合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前有一个包含 4000 万个文档和 25 GB 索引大小的集合.集合每 n 分钟更新一次,因此删除的文档数量不断增加.集合中的数据是 1000 多个客户记录的合并.每个客户的文档数量平均约为 100,000 条记录.

I currently have a single collection with 40 million documents and index size of 25 GB. The collections gets updated every n minutes and as a result the number of deleted documents is constantly growing. The data in the collection is an amalgamation of more than 1000+ customer records. The number of documents per each customer is around 100,000 records on average.

话虽如此,我正在尝试处理不断增长的已删除文档大小.由于索引大小不断增加,磁盘空间和内存都被用光了.并希望将其减小到可管理的大小.

Now that being said, I 'm trying to get an handle on the growing deleted document size. Because of the growing index size both the disk space and memory is being used up. And would like to reduce it to a manageable size.

我一直在考虑将数据拆分为多个核心,每个客户 1 个.这将使我能够轻松管理较小的集合,并且还可以快速创建/更新集合.我担心的是收藏的数量可能会成为一个问题.有关如何解决此问题的任何建议.

I have been thinking of splitting the data into multiple core, 1 for each customer. This would allow me manage the smaller collection easily and can create/update the collection also fast. My concern is that number of collections might become an issue. Any suggestions on how to address this problem.

Solr: 4.9
Index size:25 GB
Max doc: 40 million
Doc count:29 million

谢谢

推荐答案

我遇到了类似的问题,有多个客户和大索引数据.

I had the similar sort of issue having multiple customer and big indexed data.

我在 3.4 版中为客户创建了一个单独的核心来实现它.

I have the implemented it with version 3.4 by creating a separate core for a customer.

即每个客户一个核心.创建核心是某种创建索引或拆分数据的方式,就像我们在分片的情况下所做的那样......

i.e One core per customer. Creating core is some sort of creating indexes or splitting the data as like we do in case of sharding...

在这里,您将大索引数据拆分为不同的小段.

Here you are splitting the large indexed data in different smaller segments.

无论发生什么搜索,它都会携带在较小的索引段中..因此响应时间会更快..

Whatever the seach will happen it will carry in the smaller indexed segment.. so the response time would be faster..

到目前为止,我已经创建了近 700 个内核,并且运行良好.

I have almost 700 core created as of now and its running fine for me.

截至目前,我在管理核心方面没有遇到任何问题...

As of now I did not face any issue with managing the core...

我建议结合使用核心和分片...

I would suggest to go with combination of core and sharding...

它会帮助你实现

允许为具有不同行为的每个内核使用不同的配置,并且不会对其他内核产生影响.

Allows to have a different configuration for each core with different behavior and that will not have impact on other cores.

您可以在每个内核上以不同的方式执行更新、加载等操作.

you can perform action like update, load etc. on each core differently.

这篇关于solr 多核 vs 分片 vs 1 个大集合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆