在高流量的网站Solr的安全dataimport和核心交换 [英] Solr safe dataimport and core swap on high-traffic website

查看:116
本文介绍了在高流量的网站Solr的安全dataimport和核心交换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好同胞技师,

假设我们有一个(PHP)的网站数百万游客的一个月,我们运行具有托管4000000文档网站上的SOLR索引。 Solr的是在4个独立的服务器上运行,其中一个服务器是主服务器和其他3台服务器被复制。

Let's assume we have a (PHP) website with millions of visitors a month and we running a SolR index on the website with 4 million documents hosted. Solr is running on 4 separate servers where one server is the master and other 3 servers are replicated.

可以的插入数千份文件到Solr的每5分钟。而除此之外,用户可以更新自己的帐户,这也应该触发Solr的更新。

There can be inserted thousands of documents into Solr every 5 minutes. And besides that, user can update their account which also should trigger a solr update.

我要寻找一个安全的策略,以重建索引的快速的和的安全的不丢失任何文件。并且有一个的安全的增量/更新策略。我曾想过一个战略,我想在这里专家分享,听取他们的意见有关,如果我应该去这种方法,或者他们可能会建议一些(完全)不同。

I am looking for a safe strategy to rebuild the index fast and safe without missing any document. And to have a safe delta/update strategy. I have thought about a strategy and I want to share it with experts here to hear their opinion about and if I should go for this approach or if they might advise something (totally) different.

的Solr DataImport

有关我想用一个数据导入处理程序的所有操作。我想数据和增量导入混合到像一个配置文件<一个href=\"http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport\">DataImportHandlerDeltaQueryViaFullImport.我们正在使用MySQL数据库作为数据源。

For all operations I want to use one data-import handler. I want to mix data and delta import into one config file like the DataImportHandlerDeltaQueryViaFullImport. We are using a MySQL database as datasource.

重建索引

有关重建我记住以下的索引;我们创建一个名为重新索引附近的活的核心新的核心。随着dataimporthandler我们完全重建整个文件集(4百万个文档),大约需要1-2个小时的总。在现场有指数每分钟仍有一些更新,插入和删除。

For rebuilding the index I have the following in mind; we create a new core called 'reindex' near the 'live' core. With the dataimporthandler we completely rebuild the whole document-set (4 million documents) which takes about 1-2 hours in total. On the live index there are still every minute some updates, inserts and deletions.

在重建,历时约1-2小时,新的指数仍然没有真正达到最新了。为了使延迟更小的我们对新的核心一个人三角进口从上1-2个小时提交所有更改。当这样做该做核心交换。正常的'增量'进口那里每分钟处理程序会选择这个新的核心了。

After the rebuild, which took about 1-2 hours, the new index is still not really up-to-date anymore. To make the delay smaller we do one 'delta' import against the new core to commit all changes from the last 1-2 hours. When this is done which do a core-swap. The normal 'delta' import handler which runs every minute will pick this new core up.

Commiting更新住核心

要保持我们的生活中的核心,我们的轨道运行增量导入的每一分钟。由于核心交换的核心重新索引(也就是现在的活芯)将被跟踪EN随时保持最新状态。我猜它真的不应该是一个问题,如果这个指数延迟一段分钟,因为dataimport.properties也将被交换?增量导入具有超越这些分钟的延迟,但应该是可行的。

To keep our live core in track we run the delta import every minute. Because of the core swap the reindex core (which is now the live core) will be tracked en kept up-to-date. I am guessing it should not really be a problem if this index is delayed for some minutes because dataimport.properties will be swapped as well? The delta-import has overtake these minutes of delay but should be possible.

我希望你能理解我的处境,我的策略,如果我做你的眼睛以正确的方式可以提供意见。此外,我想知道,如果有,我不认为任何瓶颈?我们正在运行的Solr 1.4版。

I hope you understand my situation and my strategy and could advise if i'm doing it the right way in your eyes. Also I would like to know if there are any bottlenecks where I didn't think about? We are running Solr version 1.4.

有人质疑我有是,什么有关复制?如果主服务器交换核心是如何的药膏处理呢?

Some question I do have is, what about replication? If the master server swaps the core how does the salves handle this?

和依然存在的文件丢失时交换任何风险等?

And are there any risks with losing documents when swapping, etc?

在此先感谢!

推荐答案

好(硬)的问题!

全进口是一个非常沉重的操作,一般最好是运行增量查询只更新索引在你的关系数据库管理系统的最新变化。我为什么你换主,当你需要做一个全面导入:您随时掌握最新的核心直播使用增量导入全是进口的新的核心运行时,因为它需要两个小时。听起来不错,只要全进口不使用频繁。

The full-import is a very heavy operation, in general it's better to run delta queries to only update your index to the latest changes in your RDMS. I got why you swap the master when you need to do a full-import: you keep up-to-date the live core using delta-import while the full-import is running on the new core, since it takes two hours. Sounds good, as long as the full-import is not used that frequently.

关于复制,我会确保有没有交换主核心之前,任何正在进行的复制。有关复制如何工作的详细信息,你可以看一下 Solr的维基如果你还没有这么做过。

Regarding the replication, I would make sure that there isn't any replication in progress before swapping the master core. For more details about how replication works you can have a look at the Solr wiki if you haven't done it yet.

此外,我会确保有没有在现场核心的增量导入运行交换主核心了。

Furthermore, I would make sure that there isn't any delta-import running on the live core before swapping the master core.

这篇关于在高流量的网站Solr的安全dataimport和核心交换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆