嵌套与父/子文档的缩放 [英] Scaling with regard to Nested vs Parent/Child Documents

查看:21
本文介绍了嵌套与父/子文档的缩放的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行一个概念证明,让我们在 ES 中对更多规范化"数据运行嵌套查询.

I'm running a proof of concept for us to run nested queries on more "normalised" data in ES.

例如带有嵌套

客户->- 姓名
- 电子邮件- 事件 ->- 创建- 输入

Customer -> - name
- email - events -> - created - type

现在我遇到了一种情况,可以将给定客户的事件列表移动到另一个客户.例如客户 A 有 50 个事件客户 B 有 5000 个事件

Now I have a situation where a list of events for a given customer can be moved to another customer. e.g. Customer A has 50 events Customer B has 5000 events

我现在想将所有事件从客户 A 转移到客户 B

I now want to move all events from customer A into Customer B

在拥有数百万客户的规模上运行的查询是针对 UI 中的图形运行的,父/子更适合还是嵌套应该能够处理它?

At scale with millions of customers and queries are run on this for graphs in a UI is Parent/Child more suitable or should nested be able to handle it?

我的情况有什么优点和缺点?

What are the pros and cons in my situation?

推荐答案

很难为您提供诸如Nested 已经足够好"这样的粗略性能指标,但我可以为您提供一些有关 Nested 与 Parent/Child 的详细信息,它们会有所帮助.我仍然建议进行一些基准测试以验证性能是否可以接受.

It's hard to give you even rough performance metrics like "Nested is good enough", but I can give you some details about Nested vs Parent/Child that can help. I'd still recommend working up a few benchmark tests to verify performance is acceptable.

嵌套

  • 嵌套文档彼此存储在同一个 Lucene 块中,这有助于提高读取/查询性能.读取嵌套文档比等效的父/子文档更快.
  • 更新嵌套文档(父级或嵌套子级)中的单个字段会强制 ES 重新索引整个嵌套文档.这对于大型嵌套文档来说可能非常昂贵
  • 更改父"意味着 ES 将:删除旧文档、使用较少嵌套数据重新索引旧文档、删除新文档、使用新嵌套数据重新索引新文档.
  • Nested docs are stored in the same Lucene block as each other, which helps read/query performance. Reading a nested doc is faster than the equivalent parent/child.
  • Updating a single field in a nested document (parent or nested children) forces ES to reindex the entire nested document. This can be very expensive for large nested docs
  • Changing the "parent" means ES will: delete old doc, reindex old doc with less nested data, delete new doc, reindex new doc with new nested data.

父/子

  • 子级与父级分开存储,但路由到同一个分片.所以父/子在读取/查询上的性能略低于嵌套
  • 父/子映射有一点额外的内存开销,因为 ES 在内存中维护一个连接"列表
  • 更新子文档不会影响父文档或任何其他子文档,这可能会为大型文档节省大量索引
  • 更改父文档意味着您将删除旧的子文档,然后在新父文档下索引相同的文档.

嵌套可能会工作正常,但如果您认为存在大量数据混洗"的可能性,那么父/子可能更合适.嵌套最适合嵌套数据不经常更新但经常读取的情况.父/子更适合数据更频繁移动的安排.

It is possible Nested will work fine, but if you think there is the possibility for a lot of "data shuffling", then Parent/Child may be more suitable. Nested is best suited for instances where the nested data is not updated frequently but read often. Parent/Child is better for arrangements where the data moves around more frequently.

这篇关于嵌套与父/子文档的缩放的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆