架构更改是否需要重新索引所有 Solr 文档或仅包含更改的架构字段的文档? [英] Does schema change require reindex of all Solr documents or just documents containing the changed schema fields?

查看:52
本文介绍了架构更改是否需要重新索引所有 Solr 文档或仅包含更改的架构字段的文档?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的 Solr 索引中有数百万个文档.只有一千个文档具有字段 A,我想更改其架构.架构更改包括将 multiValued 从 true 更改为 false,将 stored 从 false 更改为 true,以及 type 从文本更改为字符串,这些需要重新指数.重新索引一千个文档需要几分钟,而重新索引所有内容需要几天时间.

I have millions of documents in my Solr index. Only a thousand of those documents have field A, whose schema I want to change. The schema changes include changing multiValued from true to false, stored from false to true, and type from text to string, things that require re-index. Re-indexing the thousand documents will take me a few minutes, where-as re-indexing everything will take days.

Solr wiki 上的重新索引页面 (http://wiki.apache.org/solr/HowToReindex) 说您可能需要在开始索引过程之前删除所有文档",但没有说明何时不这样做.

The re-indexing page on Solr wiki (http://wiki.apache.org/solr/HowToReindex) says "you may need to delete all documents before you begin your indexing process", but doesn't say when you don't.

我可以只删除包含字段 A 的一千个文档并重新索引那一千个文档,还是需要在重新索引它们之前删除整个索引(所有文档)?

Can I delete just the thousand documents containing field A and re-index those thousand, or do I need to delete the entire index (all documents) before re-indexing them all?

我已经在一个小的样本索引中测试了删除少数"场景;并且更新和查询在更改的字段上按预期工作.不过,不知道是不是运气好,因为没有全部删除,所以潜伏着一些问题.

I've tested the "deleting the few" scenario in a small, sample index; and updates and queries work as expected on the changed field. However, I don't know if I just got lucky and some problems are lurking due to not deleting everything.

推荐答案

  • 如果您使用相同的 id(在您的 schema.xml 中定义的唯一键)索引文档,那么您不必在索引之前删除它们.为具有相同 ID 的文档编制索引将覆盖现有文档.
  • 请记住,当您索引具有相同 Id 的文档时,旧文档会自动标记为已删除",但不会从索引中物理删除.并且词向量分析适用于所有文档(包括删除的文档)

    Just keep in mind that when you index a document with the same Id, the old document is automatically marked as 'deleted' but not physically deleted from the index. And Term Vector Analysis is applied to all documents (including deleted documents)

    如果您需要物理清理已删除的文档,则需要执行索引优化",您可以从 solr 管理界面执行此操作.

    If you need to physically clean up deleted documents, you need to perform index 'Optimize', you can do this from solr admin interface.

    • 如果您对架构进行了更改,则不必为所有内容建立索引.仅对受影响的文档重新编制索引就足够了.

    所以如果我在你的位置上,我什至不会删除任何东西.我只会重新索引几千个受影响的文档.然后再做优化,清理索引.

    So If I were in your place, I would not even delete anything. I would just re-index only the few thousands affected documents. Then do optimize later to clean up the index.

    这篇关于架构更改是否需要重新索引所有 Solr 文档或仅包含更改的架构字段的文档?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆