亲子关系(孙子) [英] Elasticsearch deeper level Parent-child relationship (grandchild)

查看:103
本文介绍了亲子关系(孙子)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要索引3个级别(或更多)的子级父母。
例如,这些级别可能是作者,书籍和该书中的字符。

I need to index 3 levels (or more) of child-parent. For example, the levels might be an author, a book, and characters from that book.

然而,当索引两个以上级别时, has_child和has_parent查询和过滤器的问题。
如果我有5个碎片,在最低级别(字符)上运行has_parent查询或第二级(books)的has_child查询时,我得到约五分之一的结果。

However, when indexing more than two-levels there is a problem with has_child and has_parent queries and filters. If I have 5 shards, I get about one fifth of the results when running a "has_parent" query on the lowest level (characters) or a has_child query on the second level(books).

我的猜测是,一本书通过父代号索引到分片,因此将与他的父(作者)一起存在,但是一个字符根据哈希的索引被索引到分片书籍编号,不一定符合本书索引的实际分片。

My guess is that a book gets indexed to a shard by it's parent id and so will reside together with his parent (author), but a character gets indexed to a shard based on the hash of the book id, which does not necessarily complies with the actual shard the book was indexed on.

所以,这意味着同一作者的所有书籍的字符都不一定居住在同一个碎片(真的是跛行整个孩子父母的优势)。

And so, this means that all character of books of the same author do not necessarily reside in the same shard (kind of crippling the whole child-parent advantage really).

我做错了吗?我如何解决这个问题,因为我真正需要复杂的查询,例如什么作者写了女性角色的书。

Am I doing something wrong? How can I resolve this, as I am in real need for complex queries such as "what authors wrote books with female characters" for example.

我疯狂地展示了问题在:
https://gist.github.com/eranid/5299628

I mad a gist showing the problem, at: https://gist.github.com/eranid/5299628

底线是,如果我有一个映射:

Bottom line is, that if I have a mapping:

"author" : {          
      "properties" : {
    "name" : {
      "type" : "string"
    }
      }
    },
"book" : {        
      "_parent" : {
    "type" : "author"
      },
      "properties" : {
    "title" : {
      "type" : "string"
    }
      }
    },

"character" : {       
      "_parent" : {
    "type" : "book"
      },
      "properties" : {
    "name" : {
      "type" : "string"
    }
      }
    }

和5个碎片索引,我无法使查询ies与has_child和has_parent

and a 5 shards index, I cannot make queries with "has_child" and "has_parent"

查询:

curl -XPOST 'http://localhost:9200/index1/character/_search?pretty=true' -d '{
  "query": {
    "bool": {
      "must": [
        {
          "has_parent": {
            "parent_type": "book",
            "query": {
              "match_all": {}
            }
          }
        }
      ]
    }
  }
}'

只返回第五个(大约)的字符。

returns only a fifth (approximately) of the characters.

推荐答案

您是正确的,父/子关系只能在给定父级的所有子级与父级相同的分片中生效。 Elasticsearch通过使用父ID作为路由值来实现此目的。它在一个层面上运行良好。然而,它打破了第二和连续的水平。当你有父/子/孙子关系时,父母根据他们的身份进行路由,孩子根据父ids(作品)进行路由,但是后来的孙子们根据孩子的ids路由,并且他们最终在错误的碎片中。为了演示一个例子,我们假设我们索引3个文件:

You are correct, parent/child relationship can only work when all children of a given parent resides in the same shard as the parent. Elasticsearch achieves this by using parent id as a routing value. It works great on one level. However, it breaks on the second and consecutive levels. When you have parent/child/grandchild relationship parents are routed based on their id, children are routed based on the parent ids (works), but then grandchildren are routed based on the children ids and they end up in wrong shards. To demonstrate it on an example, let's assume that we are indexing 3 documents:

curl -XPUT localhost:9200/test-idx/author/Douglas-Adams -d '{...}'
curl -XPUT localhost:9200/test-idx/book/Mostly-Harmless?parent=Douglas-Adams -d '{...}'
curl -XPUT localhost:9200/test-idx/character/Arthur-Dent?parent=Mostly-Harmless -d '{...}'

Elasticsearch使用值 Douglas-Adams 计算文档的路由 Douglas-Adams - 这里没有惊喜。对于文档 Mostly-Harmless ,Elasticsearch看到它有父 Douglas-Adams ,所以它再次使用 Douglas-Adams 计算路由,一切都很好 - 路由值相同意味着相同的分片。但是对于文档 Arthur-Dent Elasticsearch看到它有父 Mostly-Harmless ,所以它使用值 Mostly-Harmless 作为一个路由,作为结果文件 Arthur-Dent 最后错误的碎片。

Elasticsearch uses value Douglas-Adams to calculate routing for the document Douglas-Adams - no surprise here. For the document Mostly-Harmless, Elasticsearch sees that it has parent Douglas-Adams, so it uses again Douglas-Adams to calculate routing and everything is good - same routing value means same shard. But for the document Arthur-Dent Elasticsearch sees that it has parent Mostly-Harmless, so it uses value Mostly-Harmless as a routing and as a result document Arthur-Dent ends up in wrong shard.

解决方案是明确指定孙子的路由值等于祖父母的id:

The solution for this is to explicitly specify routing value for the grandchildren equal to the id of the grandparent:

curl -XPUT localhost:9200/test-idx/author/Douglas-Adams -d '{...}'
curl -XPUT localhost:9200/test-idx/book/Mostly-Harmless?parent=Douglas-Adams -d '{...}'
curl -XPUT localhost:9200/test-idx/character/Arthur-Dent?parent=Mostly-Harmless&routing=Douglas-Adams -d '{...}'

这篇关于亲子关系(孙子)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆