如何在 Elasticsearch 中加入 - 或在 Lucene 级别 [英] How to do a join in Elasticsearch -- or at the Lucene level

查看:34
本文介绍了如何在 Elasticsearch 中加入 - 或在 Lucene 级别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 Elasticsearch 中执行相当于 SQL 联接的最佳方法是什么?

What's the best way to do the equivalent of an SQL join in Elasticsearch?

我有一个包含两个大表的 SQL 设置:Persons 和 Items.一个人可以拥有许多项.Person 和 Item 行都可以更改(即更新).我必须运行搜索,根据人和项目的各个方面进行过滤.

I have an SQL setup with two large tables: Persons and Items. A Person can own many items. Both Person and Item rows can change (i.e. be updated). I have to run searches which filter by aspects of both the person and the item.

在 Elasticsearch 中,您可以将 Person 设为 Item 的嵌套文档,然后使用 has_child.

In Elasticsearch, it looks like you could make Person a nested document of Item, then use has_child.

但是:如果您随后更新 Person,我认为您需要更新他们拥有的每个 Item(可能很多).

But: if you then update a Person, I think you'd need to update every Item they own (which could be a lot).

正确吗?在 Elasticsearch 中有解决这个查询的好方法吗?

Is that correct? Is there a nice way to solve this query in Elasticsearch?

推荐答案

如前所述,要走的路是父/子.关键是嵌套文档的性能非常好,但为了更新它们,您需要重新提交整个结构(父文档 + 嵌套文档).尽管嵌套文档的内部实现由单独的 lucene 文档组成,但那些嵌套文档是不可见的,也不能直接访问.事实上,当使用嵌套文档时,您需要使用适当的查询来访问它们(嵌套查询、嵌套过滤器、嵌套构面等).

As already mentioned the way to go is parent/child. The point is that nested documents are extremely performant but in order for them to be updated you need to re-submit the whole structure (parent + nested documents). Although the internal implementation of nested documents consists of separate lucene documents, those nested doc are not visible nor directly accessible. In fact when using nested documents you then need to use proper queries to access them (nested query, nested filter, nested facet etc.).

另一方面,父/子允许您拥有相互引用的单独文档,这些文档可以独立更新.它在性能和使用的内存方面有一定的成本,但它比嵌套文档更灵活.

On the other hand parent/child allows you to have separate documents that refer to each other, which can be updated independently. It has a cost in terms of performance and memory used but it is way more flexible than nested documents.

正如这篇文章中提到的那样,elasticsearch帮助您管理关系并不意味着您必须使用这些功能.在许多复杂的用例中,在处理关系的应用程序层上有一些自定义逻辑会更好.在方面也有父母/孩子的限制:例如,你不能同时取回父母和孩子,而不是嵌套文档不允许只取回匹配的孩子(目前).

As mentioned in this article though, the fact that elasticsearch helps you managing relations doesn't mean that you must use those features. In a lot of complex usecases it is just better to have some custom logic on the application layer that handles with relations. In facet there are limitations with parent/child too: for instance you can never get back both parent and children at the same time, as opposed to nested documents that doesn't allow to get back only matching children (for now).

这篇关于如何在 Elasticsearch 中加入 - 或在 Lucene 级别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆