在多个层次的弹性搜索中绘制一本书,嵌套与父子关系 [英] Map a book in elasticsearch with many levels, nested vs parent-child relationship

查看:150
本文介绍了在多个层次的弹性搜索中绘制一本书,嵌套与父子关系的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为可以搜索多本书籍的索引创建映射时,最好使用下面的嵌套映射,或使用亲子关系

 书:{
属性:{
isbn:{// - 书籍的ISBN
类型:'string'// - 9783791535661
},
标题: {// - 书的标题
类型:'string'// - 爱丽丝梦游仙境
},
作者:{// - 书的作者(也许应该是数组)
type:'string'// - Lewis Carroll
},
category:{// - 书的类别(也许应该是数组)
type:'string'// - 幻想
},
toc:{// - 书中的章节数组
类型:'嵌套',
属性:{
html:{/ / - HTML章节
的内容类型:'string'// - <!DOCTYPE html>< html> ...< / html>
},
标题:{// - 章节的标题
类型:'string'// - 下来兔洞
},
fileName:{/ / - 本章的文件名
type:'string'// - chapter_1.html
},
firstPage:{// - 本章的第一页
type: 'integer'// - 3
},
numberOfPages:{// - 本章中有多少页
type:'integer'// - 27
},
部分:{// - 一个章节内所有部分的数组
类型:'嵌套',
属性:{
html:{// - 一个section
type:'string'// - < section> ...< / section>
},
title:{// - 节的标题
type:'string'// - section number 2或something
},
数字: {// - 一个部分内的数字的数组
类型:'嵌套',
属性:{
html:{// - 图形的HTML内容
类型: string'// - < figure> ...< / figure>
},
caption:{// - 图的名称
类型:'string'// - 图1
},
id:{// - 一个数字的id
类型:'string',// figure4
}
}
},
段落:{// - section
type:'nested',
properties:{
html:{// - 段落的HTML内容
type:'string',// - < p> ...< / p为H.
}
id:{// - 段落号
类型:'string',// paragraph3
}
}
}
}
}
}
}
}
}

整本书html的大小约为250kB。
我想要查询的东西,如

   - 最匹配的段落,包括它两边最近的段落
- 一本书中最好的匹配部分,包括任何孩子部分
- 给出的最佳数字是在匹配的标题下的一个部分
- etc

我不太了解我想要执行的查询的具体细节,但重要的是要有很大的灵活性才能尝试非常奇怪的,而不必更改所有的映射。

解决方案

如果您使用嵌套的类型,所有内容都将包含在相同的 _source 文档中,这对于大书来说可能是一个很好的一面。



而如果您为每个章节和/或部分使用父/子文档,则可能会出现较小的块,这些块更可咀嚼...



一如往常,它在很大程度上取决于你的查询不好意思,所以你应该首先考虑一下您想要支持的所有用例,然后你会更好地掌握哪种方法最好。



另一种方法既不使用嵌套的,也不使用父/子,而只涉及非规范化。具体来说,你选择你想考虑的最小的实体,例如一个部分,然后简单地为每个部分创建独立的文档。在这些部分文档中,您将获得书名,作者,章节标题,章节标题等的字段。



您可以在自己的索引中尝试各种方法,看看你的用例如何。


When creating the mappings for an index that can search through multiple books, is it preferable to use nested mappings like below, or using documents with a parent-child relationship

book: {
  properties: {
    isbn:     {       //- ISBN of the book
      type: 'string'  //- 9783791535661
    },
    title:    {       //- Title of the book
      type: 'string'  //- Alice in Wonderland
    },
    author:   {       //- Author of the book(maybe should be array)
      type: 'string'  //- Lewis Carroll
    },
    category: {       //- Category of the book(maybe should be array)
      type: 'string'  //- Fantasy
    },
    toc: {            //- Array of the chapters in the book
      type: 'nested',
      properties: {
        html: {           //- HTML Content of a chapter
          type: 'string'  //- <!DOCTYPE html><html>...</html>
        },
        title: {          //- Title of the chapter
          type: 'string'  //- Down the Rabbit Hole 
        },
        fileName: {       //- File name of this chapter
          type: 'string'  //- chapter_1.html
        }, 
        firstPage: {      //- The first page of this chapter
          type: 'integer' //- 3
        }, 
        numberOfPages: {  //- How many pages are in this chapter
          type: 'integer' //- 27
        },
        sections: {       //- An array of all of the sections within a chapter
          type: 'nested',
          properties: {
            html: {           //- The html content of a section
              type: 'string'  //- <section>...</section>
            },
            title: {          //- The title of a section
              type: 'string'  //- section number 2 or something
            },
            figures: {        //- Array of the figures within a section
              type: 'nested',
              properties: {
                html: {           //- HTML content of a figure
                  type: 'string'  //- <figure>...</figure>
                },
                caption: {        //- The name of a figure
                  type: 'string'  //- Figure 1
                },
                id: {             //- Id of a figure
                  type: 'string', // figure4
                }
              }
            },
            paragraphs: {     //- Array of the paragraphs within a section
              type: 'nested',
              properties: {   
                html: {           //- HTML content of a paragraph
                  type: 'string', //- <p>...</p>
                }
                id: {             //- Id of a paragraph
                  type: 'string', // paragraph3
                }
              }
            }
          }
        }
      }
    }
  }
}

The size of an entire books html is approximately 250kB. I would want to query things such as

- the best matching paragraph including it's nearest paragraphs on either side
- the best matching section from a single book including any child sections
- the best figure given it is inside a section with a matching title
- etc

I don't really know the specifics of the queries I would want to perform, but it is important to have a lot of flexibility to be able to try out very weird ones without having to change all of my mappings too much.

解决方案

If you use the nested type, everything will be contained in the same _source document, which for big books can be quite a mouthful.

Whereas if you use parent/child docs for each chapters and/or sections, you might end up with smaller chunks which are more chewable...

As always, it heavily depends on the queries you will want to make, so you should first think about all the use cases you will want to support and then you'll be better armed to figure out which approach is best.

There's another approach which uses neither nested nor parent/child, and which simply involves denormalization. Concretely, you pick the smallest "entity" you want to consider, e.g. a section, and then simply create standalone documents for each section. In those section documents, you'd have fields for the book title, author, chapter title, section title, etc.

You can try each approach in their own index and see how it goes for your use cases.

这篇关于在多个层次的弹性搜索中绘制一本书,嵌套与父子关系的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆