在多个层次的弹性搜索中绘制一本书,嵌套与父子关系 [英] Map a book in elasticsearch with many levels, nested vs parent-child relationship
问题描述
为可以搜索多本书籍的索引创建映射时,最好使用下面的嵌套映射,或使用亲子关系
书:{
属性:{
isbn:{// - 书籍的ISBN
类型:'string'// - 9783791535661
},
标题: {// - 书的标题
类型:'string'// - 爱丽丝梦游仙境
},
作者:{// - 书的作者(也许应该是数组)
type:'string'// - Lewis Carroll
},
category:{// - 书的类别(也许应该是数组)
type:'string'// - 幻想
},
toc:{// - 书中的章节数组
类型:'嵌套',
属性:{
html:{/ / - HTML章节
的内容类型:'string'// - <!DOCTYPE html>< html> ...< / html>
},
标题:{// - 章节的标题
类型:'string'// - 下来兔洞
},
fileName:{/ / - 本章的文件名
type:'string'// - chapter_1.html
},
firstPage:{// - 本章的第一页
type: 'integer'// - 3
},
numberOfPages:{// - 本章中有多少页
type:'integer'// - 27
},
部分:{// - 一个章节内所有部分的数组
类型:'嵌套',
属性:{
html:{// - 一个section
type:'string'// - < section> ...< / section>
},
title:{// - 节的标题
type:'string'// - section number 2或something
},
数字: {// - 一个部分内的数字的数组
类型:'嵌套',
属性:{
html:{// - 图形的HTML内容
类型: string'// - < figure> ...< / figure>
},
caption:{// - 图的名称
类型:'string'// - 图1
},
id:{// - 一个数字的id
类型:'string',// figure4
}
}
},
段落:{// - section
type:'nested',
properties:{
html:{// - 段落的HTML内容
type:'string',// - < p> ...< / p为H.
}
id:{// - 段落号
类型:'string',// paragraph3
}
}
}
}
}
}
}
}
}
整本书html的大小约为250kB。
我想要查询的东西,如
- 最匹配的段落,包括它两边最近的段落
- 一本书中最好的匹配部分,包括任何孩子部分
- 给出的最佳数字是在匹配的标题下的一个部分
- etc
我不太了解我想要执行的查询的具体细节,但重要的是要有很大的灵活性才能尝试非常奇怪的,而不必更改所有的映射。
如果您使用嵌套的
类型,所有内容都将包含在相同的 _source
文档中,这对于大书来说可能是一个很好的一面。
而如果您为每个章节和/或部分使用父/子文档,则可能会出现较小的块,这些块更可咀嚼...
一如往常,它在很大程度上取决于你的查询不好意思,所以你应该首先考虑一下您想要支持的所有用例,然后你会更好地掌握哪种方法最好。
另一种方法既不使用嵌套的,也不使用父/子,而只涉及非规范化。具体来说,你选择你想考虑的最小的实体,例如一个部分,然后简单地为每个部分创建独立的文档。在这些部分文档中,您将获得书名,作者,章节标题,章节标题等的字段。
您可以在自己的索引中尝试各种方法,看看你的用例如何。
When creating the mappings for an index that can search through multiple books, is it preferable to use nested mappings like below, or using documents with a parent-child relationship
book: {
properties: {
isbn: { //- ISBN of the book
type: 'string' //- 9783791535661
},
title: { //- Title of the book
type: 'string' //- Alice in Wonderland
},
author: { //- Author of the book(maybe should be array)
type: 'string' //- Lewis Carroll
},
category: { //- Category of the book(maybe should be array)
type: 'string' //- Fantasy
},
toc: { //- Array of the chapters in the book
type: 'nested',
properties: {
html: { //- HTML Content of a chapter
type: 'string' //- <!DOCTYPE html><html>...</html>
},
title: { //- Title of the chapter
type: 'string' //- Down the Rabbit Hole
},
fileName: { //- File name of this chapter
type: 'string' //- chapter_1.html
},
firstPage: { //- The first page of this chapter
type: 'integer' //- 3
},
numberOfPages: { //- How many pages are in this chapter
type: 'integer' //- 27
},
sections: { //- An array of all of the sections within a chapter
type: 'nested',
properties: {
html: { //- The html content of a section
type: 'string' //- <section>...</section>
},
title: { //- The title of a section
type: 'string' //- section number 2 or something
},
figures: { //- Array of the figures within a section
type: 'nested',
properties: {
html: { //- HTML content of a figure
type: 'string' //- <figure>...</figure>
},
caption: { //- The name of a figure
type: 'string' //- Figure 1
},
id: { //- Id of a figure
type: 'string', // figure4
}
}
},
paragraphs: { //- Array of the paragraphs within a section
type: 'nested',
properties: {
html: { //- HTML content of a paragraph
type: 'string', //- <p>...</p>
}
id: { //- Id of a paragraph
type: 'string', // paragraph3
}
}
}
}
}
}
}
}
}
The size of an entire books html is approximately 250kB. I would want to query things such as
- the best matching paragraph including it's nearest paragraphs on either side
- the best matching section from a single book including any child sections
- the best figure given it is inside a section with a matching title
- etc
I don't really know the specifics of the queries I would want to perform, but it is important to have a lot of flexibility to be able to try out very weird ones without having to change all of my mappings too much.
If you use the nested
type, everything will be contained in the same _source
document, which for big books can be quite a mouthful.
Whereas if you use parent/child docs for each chapters and/or sections, you might end up with smaller chunks which are more chewable...
As always, it heavily depends on the queries you will want to make, so you should first think about all the use cases you will want to support and then you'll be better armed to figure out which approach is best.
There's another approach which uses neither nested nor parent/child, and which simply involves denormalization. Concretely, you pick the smallest "entity" you want to consider, e.g. a section, and then simply create standalone documents for each section. In those section documents, you'd have fields for the book title, author, chapter title, section title, etc.
You can try each approach in their own index and see how it goes for your use cases.
这篇关于在多个层次的弹性搜索中绘制一本书,嵌套与父子关系的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!