Apache solr经常添加/编辑/删除记录 [英] Apache solr adding/editing/deleting records frequently

查看:165
本文介绍了Apache solr经常添加/编辑/删除记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在考虑使用Apache Solr。在我的数据库中,我将有大约10.000.000个记录。我将使用它的最坏情况有大约20个可搜索/可排序的字段。我的问题是这些字段可能会在白天频繁更改值。例如在我的数据库中,我可能会在10000条记录的同时更改某些字段,这可能每天发生0,1或1000次等等。重点是每次我更新数据库中的值我希望它更新在solr中,我每次都可以使用更新的数据进行搜索。

I'm thinking about using Apache Solr. In my db I will have around 10.000.000 records.The worst case where I will use it has around 20 searchable/sortable fields. My problem is that these fields may change values frequently during the day. For example in my db I might change some fields at the same time of 10000 records and this may happen 0, 1 or 1000 times a day etc. The point is that each time I update a value in the db I want it to be updated in solr too so I can search with the updated data each time.

对于那些使用过solr的人来说,在这些卷中重新索引的速度有多快?这会更新(从我读到的内容中删除和读取一条记录)并且它的索引例如成本为5秒,5分钟,一小时,什么?考虑它将在一台好的服务器上运行。

For those of you that have used solr, how fast can re indexing in such volumes be? Will this update (delete and readd a record from what i read) and it's indexing for example cost 5seconds, 5 minutes , one hour , what? Consider it will be running on a good server.

推荐答案

没有实际尝试就很难分辨。但是你需要知道Lucene和Solr目前不支持单个文档更新(尽管有一些工作正在进行中 https://issues.apache.org/jira/browse/LUCENE-3837 ),这意味着即使您只更新了一个字段,也需要重新索引整个记录。

It's very hard to tell without actually trying. However you need to know that Lucene and Solr currently don't support individual document updates (although there is some work in progress https://issues.apache.org/jira/browse/LUCENE-3837), meaning that you need to re-index the whole record even if you only updated a single field.

此外,Lucene和Solr在执行批量更新方面比单文档更新要好得多。要解决这个问题,Solr有一个很好的 commitWithin 参数,让Solr将各个更新组合在一起提高吞吐量。

Moreover Lucene and Solr are much better at performing batch updates than single-document updates. To work around this, Solr has a nice commitWithin parameter that lets Solr group individual updates together to improve throughput.

您应该谨慎地使用此数字,但我经常创建数百万个文档(约30个小字段)的索引,吞吐量为~5000 docs / s on非常传统的硬件。

You should take this number precautiously, but I often create indexes of millions of documents (~30 small fields) with a throughput of ~5000 docs/s on very conventional hardware.

这篇关于Apache solr经常添加/编辑/删除记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆