使用commitWithin进行Solr性能没有意义 [英] Solr performance with commitWithin does not make sense

查看:225
本文介绍了使用commitWithin进行Solr性能没有意义的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行一个非常简单的性能实验,将2000个文档发布到应用程序中。
实习生将其持久保存到关系数据库中,并将其发送到Solr进行索引(在同一请求中)。

I am running a very simple performance experiment where I post 2000 documents to my application. Who in tern persists them to a relational DB and sends them to Solr for indexing (Synchronously, in the same request).

我正在测试3个用例:


  1. 根本不建立索引-约45秒即可发布2000个文档

  2. 包括索引-提交每次添加之后。大约8分钟(!)发布和索引2000个文档

  3. 包括索引-commitWithin 1ms〜55秒(!)发布和索引2000个文档

第3个结果没有任何意义,我希望其行为类似于第2点中的行为。起初我以为文档不是真正提交的,但我可以通过在实验过程中执行一些查询(通过solr Web UI)实际看到它们被添加了。

The 3rd result does not make any sense, I would expect the behavior to be similar to the one in point 2. At first I thought that the documents were not really committed but I could actually see them being added by executing some queries during the experiment (via the solr web UI).

我担心我错过了很多东西。每次添加后提交是否有可能使性能降低400倍?!

I am worried that I am missing something very big. Is it possible that committing after each add will degrade performance by a factor of 400?!

我用于第2点的代码:

SolrInputDocument = // get doc
SolrServer solrConnection = // get connection 
solrConnection.add(doc);
solrConnection.commit(); 

这里是第3点的代码:

SolrInputDocument = // get doc
SolrServer solrConnection = // get connection
solrConnection.add(doc, 1); // According to API documentation I understand there is no need to call an explicit commit after this


之后不需要调用显式提交

推荐答案

根据此Wiki:

https://wiki.apache.org/solr/NearRealtimeSearch

commitWithin默认是软提交。软提交在使添加的文档可立即搜索方面非常有效。但!它们尚未在磁盘上。这意味着文档正在提交到RAM中。在此设置中,您将使用updateLog来容忍Solr实例崩溃。

the commitWithin is a soft-commit by default. Soft-commits are very efficient in terms of making the added documents immediately searchable. But! They are not on the disk yet. That means the documents are being committed into RAM. In this setup you would use updateLog to be solr instance crash tolerant.

您在第2点中所做的工作是很严格的,即将刷新的文档刷新到磁盘上。在每个文档添加之后执行此操作非常昂贵。因此,相反,发布一堆文档并进行硬提交,或者甚至将autoCommit设置为某个合理的值,例如10分钟或1小时(取决于用户的期望)。

What you do in point 2 is hard-commit, i.e. flush the added documents to disk. Doing this after each document add is very expensive. So instead, post a bunch of documents and issue a hard commit or even have you autoCommit set to some reasonable value, like 10 min or 1 hour (depends on your user expectations).

这篇关于使用commitWithin进行Solr性能没有意义的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆