Solr索引编制问题(内存不足)-寻找解决方案 [英] Solr indexing issue (out of memory) - looking for a solution

查看:206
本文介绍了Solr索引编制问题(内存不足)-寻找解决方案的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我拥有5000万份文档的大型索引.所有都在同一台机器上运行(无分片). 我没有允许我更新所需文档的ID,因此对于每次更新,我必须删除整个索引,并从头开始对所有内容进行索引,并仅在完成索引后才提交.

I have a large index of 50 Million docs. all running on the same machine (no sharding). I don't have an ID that will allow me to update the wanted docs, so for each update I must delete the whole index and to index everything from scratch and commit only at the end when I'm done indexing.

我的问题是每运行几个索引,由于内存不足异常,我的Solr崩溃,我使用的是12.5 GB内存. 据我了解,直到提交所有内容都保存在内存中,所以我将100M文档而不是50M存储在内存中.我对吗? 但是我在建立索引时无法提交,因为我一开始就删除了所有文档,然后我将使用部分索引来运行,这是很糟糕的.

My problem is that every few index runs, My Solr crashes with out of memory exception, I am running with 12.5 GB memory. From what I understand, until the commit everything is being saved in the memory, so I'm storing in the memory 100M docs instead of 50M. am I right? But I cannot make commits while I'm indexing, because I deleted all docs at the beginning and than I'll run with partial index which is bad.

有没有已知的解决方案?分片可以解决这个问题,还是我仍然会遇到同样的问题? 是否有一个标志可以让我进行软提交,但是直到硬提交它才会更改索引?

Is there any known solutions for that? can sharding solve it or I still going to have the same problem? Is there a flag that allow me to make soft-commits but it won't change the index until the hard-commit?

推荐答案

您可以使用主从复制.只需专用一台计算机来进行索引(主solr),然后,如果完成,则可以告诉从服务器从主计算机复制索引.从站将下载新索引,并且仅在下载成功后才删除旧索引.因此非常安全.

You can use the master slave replication. Just dedicate one machine to do your indexing (master solr), and then, if it's finished, you can tell the slave to replicate the index from the master machine. The slave will download the new index, and it will only delete the old index if the download is successful. So it's quite safe.

http://wiki.apache.org/solr/SolrReplication

另一个避免所有这种复制设置的解决方案是使用反向代理,将nginx或类似的东西放在solr前面.使用一台机器为新数据建立索引,另一台机器进行搜索.而且,您只需使反向代理始终指向当前未执行任何索引编制的反向代理即可.

One other solution to avoid all this replication set-up is to use a reverse proxy, put nginx or something of the like in front of your solr. Use one machine for indexing the new data, and the other for searching. And you can just make the reverse proxy to always point at the one not currently doing any indexing.

如果您选择其中之一,则可以根据需要多次提交.

If you do one of them, then you can just commit as often as you want.

由于在同一台计算机上进行索引和搜索通常不是一个好主意,所以我宁愿使用主从解决方案(更不用说您有5000万个文档).

And because it's generally a bad idea to do indexing and search in one same machine, I will prefer to use the master-slave solution (not to mention you have 50M docs).

这篇关于Solr索引编制问题(内存不足)-寻找解决方案的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆