具有高索引的实时系统上的Solr增量备份 [英] Solr Incremental backup on real-time system with heavy index

查看:59
本文介绍了具有高索引的实时系统上的Solr增量备份的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用solr实现搜索引擎,每天至少导入200万个doc. 用户必须可以尽快(几乎实时)搜索导入的文档.

I implement search engine with solr that import minimal 2 million doc per day. User must can search on imported doc ASAP (near real-time).

我使用2个专用的Windows x64和tomcat 6(Solr分片模式).每个服务器的索引大约为1.2亿文档和大约220 GB(总计500 GB).

I using 2 dedicated Windows x64 with tomcat 6 (Solr shard mode). every server, index about 120 million doc and about 220 GB (total 500 GB).

我想在更新或搜索期间从solr索引文件中获取备份增量.
搜索之后,找到用于UNIX的rsync工具,并为Windows找到 DeltaCopy ( Windows的GUI rsync).但是在更新过程中出现错误(消失).

I want to get backup incremental from solr index file during update or search.
after search it, find rsync tools for UNIX and DeltaCopy for windows (GUI rsync for windows). but get error (vanished) during update.

如何解决这个问题.

注意1:当文件很大时,文件复制确实很慢.因此我不能使用这种方式.

Note1:File copy really slow, when file size very large. therefore i can't use this way.

注2:如果Windows崩溃或硬件重置或任何其他问题,我可以防止在更新过程中损坏索引文件吗?

Note2: Can i prevent corrupt index files during update, if windows crash or hardware reset or any other problem ?

推荐答案

在更新索引时不要运行备份.您可能会得到损坏的(因此无用的)备份.

Don't run a backup while updating the index. You will probably get a corrupt (therefore useless) backup.

一些解决方法:

  • 整理您的更新,即不要每隔 n 分钟添加/更新一次更新,而不必一直添加/更新文档.这样一来,您就可以在 n 分钟内运行备份.缺点:文档的新鲜度受到影响.
  • 使用第二个被动Solr核心:每个分片设置两个核心,一个主动和一个被动.所有查询都是针对活动核心发出的.使用复制使被动核心保持最新状态.针对被动核心运行备份.在运行备份时,您必须禁用复制.缺点:复杂,运动更多的零件需要两倍的磁盘空间才能维护被动内核.
  • Batch up your updates, i.e. instead of adding/updating documents all the time, add/update every n minutes. This will let you run the backup in between those n minutes. Cons: document freshness is affected.
  • Use a second, passive Solr core: Set up two cores per shard, one active and one passive. All queries are issued against the active core. Use replication to keep the passive core up to date. Run the backup against the passive core. You'd have to disable replication while running the backup. Cons: complex, more moving parts, requires double the disk space to maintain the passive core.

这篇关于具有高索引的实时系统上的Solr增量备份的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆