弹性搜索,多个索引与一个索引和不同数据集的类型? [英] Elastic search, multiple indexes vs one index and types for different data sets?

查看:124
本文介绍了弹性搜索,多个索引与一个索引和不同数据集的类型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个使用MVC模式开发的应用程序,我想现在索引多个模型,这意味着每个模型都有不同的数据结构。




  • 最好使用多个索引,每个模型一个索引或每个模型的相同索引中的类型?两种方式也需要我想到的不同的搜索查询。


  • 如果数据集很小或巨大,两种概念之间是否存在差异性?




我将自己测试第二个问题,如果有人可以为此推荐一些好的示例数据。

解决方案

这两种方法有不同的含义。假设您正在使用Elasticsearch的默认设置,每个模型的索引将显着增加碎片的数量,因为1个索引将使用5个碎片,5个数据模型将使用25片;而在1个索引中有5个对象类型仍然要使用5个碎片。



将每个数据模型作为索引的含义:




  • 在索引中高效,快速地搜索,因为数据分配到不同索引时,数据量应该更小。

  • 搜索来自2个或更多索引的数据模型的组合将会产生开销,因为查询将必须发送到跨索引的更多分片,编译并发回给用户。

  • 不如果您的数据集很小,建议您使用每个附加的分片,并创建更多的分片,并且性能增益是微不足道的。

  • 如果您的数据集很大,并且您的查询正在采用很长时间的处理,因为专用的碎片正在存储您的具体数据,而且弹性搜索会更容易处理。



数据模型作为索引中的对象类型:




  • 更多数据将存储在索引的5个碎片中,这意味着在不同数据模型之间查询时,开销较小的问题但是您的分片大小会更大。

  • 由于有更多的文档要过滤,分片中的更多数据将需要较长时间才能进行搜索。

  • 如果您知道您正在经历1 TB的数据,并且您不会在弹性搜索映射中的不同索引或多个分片上分发数据,那么不建议。

  • 小数据集,因为每个分片占用硬件空间,因此不会浪费存储空间。



如果你正在询问什么是太多的数据与小数据?通常情况下,这取决于硬件的处理器速度和RAM,您在Elasticsearch映射中的每个变量中存储的数据量以及查询要求;在您的查询中使用许多方面将显着减慢您的响应时间。没有直接的答案,您将不得不根据您的需要进行基准测试。


I have an application developed using the MVC pattern and I would like to index now multiple models of it, this means each model has a different data structure.

  • Is it better to use mutliple indexes, one for each model or have a type within the same index for each model? Both ways would also require a different search query I think. I just started on this.

  • Are there differences performancewise between both concepts if the data set is small or huge?

I would test the 2nd question myself if somebody could recommend me some good sample data for that purpose.

解决方案

There are different implications to both approaches.

Assuming you are using Elasticsearch's default settings, having 1 index for each model will significantly increase the number of your shards as 1 index will use 5 shards, 5 data models will use 25 shards; while having 5 object types in 1 index is still going to use 5 shards.

Implications for having each data model as index:

  • Efficient and fast to search within index, as amount of data should be smaller in each shard since it is distributed to different indices.
  • Searching a combination of data models from 2 or more indices is going to generate overhead, because the query will have to be sent to more shards across indices, compiled and sent back to the user.
  • Not recommended if your data set is small since you will incur more storage with each additional shard being created and the performance gain is marginal.
  • Recommended if your data set is big and your queries are taking a long time to process, since dedicated shards are storing your specific data and it will be easier for Elasticsearch to process.

Implications for having each data model as an object type within an index:

  • More data will be stored within the 5 shards of an index, which means there is lesser overhead issues when you query across different data models but your shard size will be significantly bigger.
  • More data within the shards is going to take a longer time for Elasticsearch to search through since there are more documents to filter.
  • Not recommended if you know you are going through 1 terabytes of data and you are not distributing your data across different indices or multiple shards in your Elasticsearch mapping.
  • Recommended for small data sets, because you will not waste storage space for marginal performance gain since each shard take up space in your hardware.

If you are asking what is too much data vs small data? Typically it depends on the processor speed and the RAM of your hardware, the amount of data you store within each variable in your mapping for Elasticsearch and your query requirements; using many facets in your queries is going to slow down your response time significantly. There is no straightforward answer to this and you will have to benchmark according to your needs.

这篇关于弹性搜索,多个索引与一个索引和不同数据集的类型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆