弹性搜索,多个索引与一个索引和不同数据集的类型? [英] Elastic search, multiple indexes vs one index and types for different data sets?

查看:144
本文介绍了弹性搜索,多个索引与一个索引和不同数据集的类型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个使用MVC模式开发的应用程序,我想现在索引多个模型,这意味着每个模型都有不同的数据结构。

I have an application developed using the MVC pattern and I would like to index now multiple models of it, this means each model has a different data structure.


  • 最好使用多个索引,每个模型使用一个索引,或者为每个模型使用相同索引中的类型。这两种方式也需要一个不同的搜索查询我想。

  • Is it better to use mutliple indexes, one for each model or have a type within the same index for each model? Both ways would also require a different search query I think. I just started on this.

如果数据集小或巨大,两个概念之间是否存在性能差异?

Are there differences performancewise between both concepts if the data set is small or huge?

我会自己测试第二个问题,如果有人可以为我推荐一些好的样本数据。

I would test the 2nd question myself if somebody could recommend me some good sample data for that purpose.

推荐答案

这两种方法有不同的含义。

There are different implications to both approaches.

假设您使用Elasticsearch的默认设置,每个模型有1个索引将显着增加您的分片数,因为1个索引将使用5个分片,5个数据模型将使用25碎片;而在1个索引中有5个对象类型,仍然使用5个分片。

Assuming you are using Elasticsearch's default settings, having 1 index for each model will significantly increase the number of your shards as 1 index will use 5 shards, 5 data models will use 25 shards; while having 5 object types in 1 index is still going to use 5 shards.

每个数据模型作为索引的影响:

Implications for having each data model as index:


  • 在索引中进行搜索的效率和速度都很快,因为每个分片中的数据量应该更小,因为它分布到不同的索引。

  • 搜索来自2个或更多个索引的数据模型的组合将产生开销,因为查询将必须发送到跨索引的更多分片,编译并发送回用户。

  • 不如果您的数据集很小,因为每创建一个额外的碎片,您的数据集将会增加更多的存储空间,并且性能提升是微不足道的。

  • 如果您的数据集很大,由于专用分片正在存储您的特定数据,因此Elasticsearch将更容易处理。

  • Efficient and fast to search within index, as amount of data should be smaller in each shard since it is distributed to different indices.
  • Searching a combination of data models from 2 or more indices is going to generate overhead, because the query will have to be sent to more shards across indices, compiled and sent back to the user.
  • Not recommended if your data set is small since you will incur more storage with each additional shard being created and the performance gain is marginal.
  • Recommended if your data set is big and your queries are taking a long time to process, since dedicated shards are storing your specific data and it will be easier for Elasticsearch to process.

数据模型作为索引中的对象类型:

Implications for having each data model as an object type within an index:


  • 更多数据将存储在索引的5个分片中,当您在不同的数据模型之间查询时会出现开销问题,但是您的分片大小会明显更大。

  • Elasticsearch搜索时,碎片中的更多数据需要更长时间,因为有更多

  • 如果您知道您正在访问1 TB的数据,并且您不在Elasticsearch映射中的不同索引或多个分片上分发数据,则不推荐使用。

  • 建议用于小型数据集,因为您不会浪费存储空间来提高边际性能,因为每个碎片占用硬件空间。

  • More data wiill be stored within the 5 shards of an index, which means there is lesser overhead issues when you query across different data models but your shard size will be significantly bigger.
  • More data within the shards is going to take a longer time for Elasticsearch to search through since there are more documents to filter.
  • Not recommended if you know you are going through 1 terabytes of data and you are not distributing your data across different indices or multiple shards in your Elasticsearch mapping.
  • Recommended for small data sets, because you will not waste storage space for marginal performance gain since each shard take up space in your hardware.

如果你问什么是太多的数据vs小数据?通常它取决于处理器速度和硬件的RAM,您在Elasticsearch的映射中存储在每个变量中的数据量以及查询要求;在查询中使用许多方面将显着减慢响应时间。没有直接的答案,你必须根据你的需要进行基准。

If you are asking what is too much data vs small data? Typically it depends on the processor speed and the RAM of your hardware, the amount of data you store within each variable in your mapping for Elasticsearch and your query requirements; using many facets in your queries is going to slow down your response time significantly. There is no straightforward answer to this and you will have to benchmark according to your needs.

这篇关于弹性搜索,多个索引与一个索引和不同数据集的类型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆