如何配置 Solr 以提高索引速度 [英] How to configure Solr for improved indexing speed

查看:19
本文介绍了如何配置 Solr 以提高索引速度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个客户端程序,它生成 1-50 百万个 Solr 文档并将它们添加到 Solr.
我正在使用 ConcurrentUpdateSolrServer 从客户端推送文档,每个请求 1000 个文档.
文档相对较小(很少有小文本字段).
我想提高索引速度.
我尝试将ramBufferSizeMB"增加到 1G,将mergeFactor"增加到 25,但没有看到任何变化.
我想知道是否有其他一些推荐设置可以提高 Solr 索引速度.
任何相关材料的链接将不胜感激.

I have a client program which generates a 1-50 millions Solr documents and add them to Solr.
I'm using ConcurrentUpdateSolrServer for pushing the documents from the client, 1000 documents per request.
The documents are relatively small (few small text fields).
I want to improve the indexing speed.
I've tried to increase the "ramBufferSizeMB" to 1G and the "mergeFactor" to 25 but didn't see any change.
I was wondering if there is some other recommended settings for improving Solr indexing speed.
Any links to relevant materials will be appreciated.

推荐答案

看起来您正在将数据批量导入 Solr,因此您无需立即搜索任何数据.

It looks like you are doing a bulk import of data into Solr, so you don't need to search any data right away.

首先,您可以增加每个请求的文档数量.由于您的文档很小,我什至会将每个请求增加到 100K 文档或更多并尝试.

First, you can increase the number of documents per request. Since your documents are small, I would even increase it to 100K docs per request or more and try.

其次,您希望在进行批量索引时减少提交的次数.在您的 solrconfig.xml 中查找:

Second, you want to reduce the number of times commits happen when you are bulk indexing. In your solrconfig.xml look for:

<!-- AutoCommit

     Perform a hard commit automatically under certain conditions.
     Instead of enabling autoCommit, consider using "commitWithin"
     when adding documents.

     http://wiki.apache.org/solr/UpdateXmlMessages

     maxDocs - Maximum number of documents to add since the last
               commit before automatically triggering a new commit.

     maxTime - Maximum amount of time in ms that is allowed to pass
               since a document was added before automatically
               triggering a new commit.

     openSearcher - if false, the commit causes recent index changes
     to be flushed to stable storage, but does not cause a new
     searcher to be opened to make those changes visible.
  -->
 <autoCommit>
   <maxTime>15000</maxTime>
   <openSearcher>false</openSearcher>
 </autoCommit>

您可以完全禁用自动提交,然后在发布所有文档后调用提交.否则,您可以按如下方式调整数字:

You can disable autoCommit altogether and then call a commit after all your documents are posted. Otherwise you can tweak the numbers as follows:

默认 maxTime 为 15 秒,因此如果有未提交的文档,每 15 秒自动提交一次,因此您可以将其设置为较大的值,例如 3 小时(即 3*60*60*1000).您还可以添加 50000000</maxDocs> 这意味着只有在添加 5000 万个文档后才会自动提交.发布所有文档后,手动或从 SolrJ 调用一次 commit - 提交需要一段时间,但总体上会快得多.

The default maxTime is 15 secs so an auto commit happens every 15 secs if there are uncommitted docs, so you can set this to something large, say 3 hours (i.e. 3*60*60*1000). You can also add <maxDocs>50000000</maxDocs> which means an auto commit happens only after 50 million documents are added. After you post all your documents, call commit once manually or from SolrJ - it will take a while to commit, but this will be much faster overall.

此外,在您完成批量导入后,减少 maxTimemaxDocs,这样您对 Solr 所做的任何增量帖子都会更快地提交.或者使用 solrconfig 中提到的 commitWithin.

Also after you are done with your bulk import, reduce maxTime and maxDocs, so that any incremental posts you will do to Solr will get committed much sooner. Or use commitWithin as mentioned in solrconfig.

这篇关于如何配置 Solr 以提高索引速度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆