在弹性搜索中处理多对多关系的最佳做法? [英] Best practice for handling many-to-many relationships in Elasticsearch?

查看:81
本文介绍了在弹性搜索中处理多对多关系的最佳做法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很确定我知道这个问题的答案,但正在寻找一个比我更有弹性的经历的人的确认。



让我们说我有包含作者和书籍的数据库。作者可以与0个或更多书籍相关联,并且一本书可以与1个或多个作者相关联。我们希望用户能够搜索作者姓名来查找作者及其所有图书,我们也希望他们能够搜索书名以获取其作者。我们知道会有大量的多作者书籍。



因为Elasticsearch只能直接支持一级的亲子关系,而且因为孩子只能有一个父母,所以在我看来,我们需要对数据进行非规范化,并使用嵌套对象建立这种关系。如果我们修改已经出版了23本图书的作者的属性,我们将需要重新编辑作者记录和他/她的所有23本书记录。



在我的幻想世界中我希望有23本书每本书都包含一系列的作者编号,这样我就可以在reindex作者的时候重新编辑书籍。看起来这样做肯定可能是使用Elasticsearch的父子支持如果一本书只能有一个作者,但是由于多对多的要求,我必须使用嵌套对象和reindex任何任何变化的相关对象。



这是正确的吗?它肯定好像更多的工作(而且更多的更新),但是我想以正确的方式做到这一点,而不是引入复杂性,错误和疯狂的聪明方式。


解决方案

从你的问题我可以放心地假设ES不会是你的主要数据存储。所以关于如何对多对多关系进行非规范化的主要问题是弄清楚使用和怎样ES。那就是你希望建立什么查询。



思考查询命令设计并相应地进行反规范化。以下是几个指针:




  • 将归一化作者ID加入书籍:您希望用户执行搜索,例如所有书籍用户id = XYZ。如果没有,您更希望将作者的姓名作为您的图书文档中的多字段

  • 复制,复制和重复。找出哪些数据将被大量更新(作者,因为书籍一般在出版后不会获得作者)。将作者非正规化为书籍(名称最有可能)。重复(另一种文件类型),像author_books一样,这将是作者和支持的孩子,并且相当频繁地更新(另外,从作者的角度来看,再次对标题和其他相关的东西进行搜索)。



希望这有意义;)


I'm pretty sure I know the answer to this question but am looking for confirmation from someone with more Elasticsearch experience than me.

Let's say I've got a database containing Authors and Books. An author can be associated with 0 or more books, and a book can be associated with 1 or more authors. We want users to be able to search on author name to find the author and all his/her books, and we also want them to be able to search on book title to get back its author(s). We know there will be plenty of multi-author books.

Because Elasticsearch only directly supports one level of parent-child relationships, and because children can only have one parent, it seems to me that we need to denormalize the data and use nested objects to establish this relationship. If we modify properties of an author who has published 23 books, we will need to reindex the author record and all 23 of his/her book records.

In my fantasy world, I'd love to have those 23 books each contain an array of author IDs so that I don't have to reindex books when I reindex authors. It seems like this would definitely be possible using Elasticsearch's parent-child support if a book could only have one author, but because of the many-to-many requirement, I have to use nested objects and reindex any related objects whenever anything changes.

Is this correct? It certainly seems like more work (and certainly more updates), but I want to do this the right way, not the "clever" way that introduces complexity and bugs and madness.

Any guidance would be appreciated.

解决方案

From your question I can safely assume that ES will not be your primary data-store. So the main question as to how to denormalise your many-to-many relationship is to figure out "how & what" will you use ES. That is what queries are you expected to build.

Thinking of "query command" design and denormalize accordingly. Here are a few pointers:

  • denormalising Authors IDs into the book: would you expect a user to execute a search such as "all book for userId=XYZ". If not, you would rather need the name of the author as a multi-field in your Book document
  • duplicate, duplicate and duplicate. Figure out which data will be heavily updated (authors, as book general do not gain author after their publication). Denormalize author into books (names most likely). Duplicate (into another document type) something like "author_books" which will would be a child of authors and support update fairly often (again, denormalise the title and other relevant stuff to search from the author perspective).

Hope this makes some sense ;)

这篇关于在弹性搜索中处理多对多关系的最佳做法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆