嵌套与父/子文档的缩放 [英] Scaling with regard to Nested vs Parent/Child Documents
问题描述
我正在运行一个概念证明,让我们在 ES 中对更多规范化"数据运行嵌套查询.
I'm running a proof of concept for us to run nested queries on more "normalised" data in ES.
例如带有嵌套
客户->- 姓名
- 电子邮件- 事件 ->- 创建- 输入
Customer ->
- name
- email
- events ->
- created
- type
现在我遇到了一种情况,可以将给定客户的事件列表移动到另一个客户.例如客户 A 有 50 个事件客户 B 有 5000 个事件
Now I have a situation where a list of events for a given customer can be moved to another customer. e.g. Customer A has 50 events Customer B has 5000 events
我现在想将所有事件从客户 A 转移到客户 B
I now want to move all events from customer A into Customer B
在拥有数百万客户的规模上运行的查询是针对 UI 中的图形运行的,父/子更适合还是嵌套应该能够处理它?
At scale with millions of customers and queries are run on this for graphs in a UI is Parent/Child more suitable or should nested be able to handle it?
我的情况有什么优点和缺点?
What are the pros and cons in my situation?
推荐答案
很难为您提供诸如Nested 已经足够好"这样的粗略性能指标,但我可以为您提供一些有关 Nested 与 Parent/Child 的详细信息,它们会有所帮助.我仍然建议进行一些基准测试以验证性能是否可以接受.
It's hard to give you even rough performance metrics like "Nested is good enough", but I can give you some details about Nested vs Parent/Child that can help. I'd still recommend working up a few benchmark tests to verify performance is acceptable.
嵌套
- 嵌套文档彼此存储在同一个 Lucene 块中,这有助于提高读取/查询性能.读取嵌套文档比等效的父/子文档更快.
- 更新嵌套文档(父级或嵌套子级)中的单个字段会强制 ES 重新索引整个嵌套文档.这对于大型嵌套文档来说可能非常昂贵
- 更改父"意味着 ES 将:删除旧文档、使用较少嵌套数据重新索引旧文档、删除新文档、使用新嵌套数据重新索引新文档.
- Nested docs are stored in the same Lucene block as each other, which helps read/query performance. Reading a nested doc is faster than the equivalent parent/child.
- Updating a single field in a nested document (parent or nested children) forces ES to reindex the entire nested document. This can be very expensive for large nested docs
- Changing the "parent" means ES will: delete old doc, reindex old doc with less nested data, delete new doc, reindex new doc with new nested data.
父/子
- 子级与父级分开存储,但路由到同一个分片.所以父/子在读取/查询上的性能略低于嵌套
- 父/子映射有一点额外的内存开销,因为 ES 在内存中维护一个连接"列表
- 更新子文档不会影响父文档或任何其他子文档,这可能会为大型文档节省大量索引
- 更改父文档意味着您将删除旧的子文档,然后在新父文档下索引相同的文档.
嵌套可能会工作正常,但如果您认为存在大量数据混洗"的可能性,那么父/子可能更合适.嵌套最适合嵌套数据不经常更新但经常读取的情况.父/子更适合数据更频繁移动的安排.
It is possible Nested will work fine, but if you think there is the possibility for a lot of "data shuffling", then Parent/Child may be more suitable. Nested is best suited for instances where the nested data is not updated frequently but read often. Parent/Child is better for arrangements where the data moves around more frequently.
这篇关于嵌套与父/子文档的缩放的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!