将子文档添加到现有 Solr 6.4 集合文档会创建重复文档 [英] Adding child documents to existing Solr 6.4 collection documents creates duplicate documents

查看：17 发布时间：2021/12/30 8:49:54 solr

本文介绍了将子文档添加到现有 Solr 6.4 集合文档会创建重复文档的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这个问题类似于 Solr 不会覆盖 - 重复的 uniqueKey 条目，但我的情况是我有已经添加到集合中的大量现有文档，没有子文档，我使用(独立而非云)Solr 6.4 而不是 5.3.1.我们最近启用了子文档，以便我们可以存储更丰富的数据.

This question is similar to Solr doesn't overwrite - duplicated uniqueKey entries, but I am in a situation where I have a large body of existing documents that have already been added to the collection with no child documents, and I am using (standalone not cloud) Solr 6.4 rather than 5.3.1. We recently enabled child documents so that we could store richer data.

我们使用 SolrJ 加载数据并查询 Solr，但为了隔离我们看到的问题，我使用命令行 Solr post 工具上传以下文档:

We use SolrJ to load data into and query Solr, but to isolate the issue we're seeing, I used the command line Solr post tool to upload the following document:

<add>
    <doc>
        <field name="id">1</field>
        <field name="solr_record_type">1</field>
        <field name="title">Fabulous Book</field>
        <field name="author">Angelo Author</field>
    </doc>
</add>

搜索结果符合预期:使用 q=id:1 和fl=id,title,index_date,[child parentFilter="solr_record_type:1"]

Search results were as expected: Using q=id:1 and fl=id,title,index_date,[child parentFilter="solr_record_type:1"]

 "response":{"numFound":1,"start":0,"docs":[
      {
        "id":"1",
        "title":"Fabulous Book",
        "index_date":"2019-01-16T23:06:57.221Z"}]
  }

然后我通过发布以下内容更新了文档:

Then I updated the document by posting the following:

<add>
    <doc>
        <field name="id">1</field>
        <field name="solr_record_type">1</field>
        <field name="title">Fabulous Book</field>
        <field name="author">Angelo Author</field>
        <doc>
            <field name="id">1-1</field>
            <field name="solr_record_type">2</field>
            <field name="contributor_name">Polly Math</field>
            <field name="contributor_type">3</field>
        </doc>
    </doc>
</add>

然后，重复我的搜索，我得到以下重复的结果，搜索唯一的 id 字段，这是不可取的.

Then, repeating my search, I got the following duplicate result, searching on the unique id field, which is undesirable.

    "response":{"numFound":2,"start":0,"docs":[
      {
        "id":"1",
        "title":"Fabulous Book",
        "index_date":"2019-01-16T23:06:57.221Z",
        "_childDocuments_":[
        {
          "id":"1-1",
          "solr_record_type":2,
          "contributor_name":"Polly Math",
          "contributor_type":3,
          "index_date":"2019-01-16T23:09:29.142Z"}]},
      {
        "id":"1",
        "title":"Fabulous Book",
        "index_date":"2019-01-16T23:09:29.142Z",
        "_childDocuments_":[
        {
          "id":"1-1",
          "solr_record_type":2,
          "contributor_name":"Polly Math",
          "contributor_type":3,
          "index_date":"2019-01-16T23:09:29.142Z"}]}]
  }

反过来说，如果我从最初加载子文档的文档开始，如下所示:

Going the other way, if I start with a document that was loaded initially with a child document, like the following:

<add>
    <doc>
        <field name="id">2</field>
        <field name="solr_record_type">1</field>
        <field name="title">Wonderful Book</field>
        <field name="author">Andy Author</field>
        <doc>
            <field name="id">2-1</field>
            <field name="solr_record_type">2</field>
            <field name="contributor_name">Polly Math</field>
            <field name="contributor_type">3</field>
        </doc>
    </doc>
</add>

然后我用一个没有孩子的文档更新它:

And then I update it with a document with no children:

<add>
    <doc>
        <field name="id">2</field>
        <field name="solr_record_type">1</field>
        <field name="title">Wonderful Book</field>
        <field name="author">Andy Author</field>
    </doc>
</add>

结果还是有孩子:

  "response":{"numFound":1,"start":0,"docs":[
      {
        "id":"2",
        "title":"Wonderful Book",
        "index_date":"2019-01-16T23:09:39.389Z",
        "_childDocuments_":[
        {
          "id":"2-1",
          "title_id":2,
          "title_instance_id":2,
          "solr_record_type":2,
          "contributor_name":"Polly Math",
          "contributor_type":3,
          "index_date":"2019-01-16T23:07:04.861Z"}]}]
  }

这很奇怪，因为如果我用只有 1 个子文档的替换文档更新一个包含 2 个子文档的文档，它确实会删除一个子文档.但在这种情况下，它不会删除子文档.

This is strange because if I update a document with 2 child documents with a replacement document with only 1 child document, it does drop one child document. But in this case, it is not dropping the child document.

没有子文档但不添加子文档的文档更新，以及带有不删除所有子文档的子文档的文档更新似乎都按我的预期工作.

Updates of documents with no child documents that don't add child documents, and updates of documents with child documents that don't remove all child documents both seem to work as I'd expect.

我有大量没有孩子的现有文件，我可能会向其中添加孩子，最终我可能有很多可能会放弃孩子的有孩子的文件.鉴于此，在不生成重复记录或丢失更新的情况下更新这些记录的最佳方法是什么?

I have a large body of existing documents that don't have children, which I may be adding children to, and eventually I may have a lot of child-having documents that might drop their children. Given that, what is the best way to update these records without generating duplicate records or losing updates?

将子文档添加到现有 Solr 6.4 集合文档会创建重复文档 [英] Adding child documents to existing Solr 6.4 collection documents creates duplicate documents

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

将子文档添加到现有 Solr 6.4 集合文档会创建重复文档 [英] Adding child documents to existing Solr 6.4 collection documents creates duplicate documents

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭