Elasticsearch,当文档存储时,它会分成不同的碎片吗? [英] Elasticsearch, when document is stored does it get split up into different shards?

查看:79
本文介绍了Elasticsearch,当文档存储时,它会分成不同的碎片吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在阅读一本有关Elasticsearch的书,但是我不清楚,尝试查看文档(并没有真正谈论其架构),但是其他帖子似乎找不到相关的帖子.

I am reading a book about elasticsearch however it is unclear to me, tried looking at the documentation(doesn't really talk much about its arcitecture), other posts however cannot seem to find related posts.

说我有一个如下文件: {消息:您好,世界,欢迎使用Elastic"}

say I have a document as follows: {message: "hello world Welcome to Elastic"}

  1. 当它插入Elasticsearch时,将经历分析阶段,并成为["hello","world","welcome","to","elastic"]
    所以现在每个术语分散在不同的碎片上?

  1. when it gets inserted into elasticsearch it will go through analyze phase and become ["hello", "world", "welcome", "to", "elastic"]
    so now each term is spread across different shards?

Elasticsearch被称为分布式数据存储",是因为文档被分发到不同的碎片中吗?在书中,它说:如果在分布式环境中创建elasticsearch,则可以将一个索引分布到不同的节点中".这是否意味着属于index1的分片子集存储在另一个索引中?

Elasticsearch is referred to as "distributed data storage" is it because document gets distributed into different shards? In the book, it says "if you create elasticsearch in distributed environment, one index can be distributed into different nodes". Does this mean subset of shards that belong to index1 is being stored in another index?

推荐答案

  1. 没有文档永远不会拆分到不同的分片中.文档ID被散列,并且该散列定义了文档将存储在哪个分片上.文档的字段将被分析为令牌,但所有这些副产品将存储在与文档相同的碎片上.

  1. No a document is never split across different shards. The document ID is hashed and that hash defines on which shard the document will get stored. The fields of a document will get analyzed into tokens but all those by-products will get stored on the same shard as the document.

分布式数据存储意味着将索引划分为多个分片,并且这些分片可以位于不同的节点上.因此,假设您有一个包含2个主要分片的索引.如果您的集群中有1个节点,它将同时获得索引的两个分片.如果您有两个节点,则每个节点将获得一个主分片.如果您有三个节点,那么一个节点将一无所获,因为分片无法进一步拆分.如果您决定为每个主分片添加一个副本分片,则您有四个分片(2个主分+ 2个副本),那么第三个节点肯定会至少获得一个分片(主分片或副本).

Distributed data storage means that indexes are partitioned into shards and those shards can be located on different nodes. So, let's say you have one index with 2 primary shards. If you have 1 node in our cluster, it will get both shards of your index. If you have two nodes, each node will get one primary shard. If you have three nodes, then one node will get nothing because shards cannot be split further. If you decide to add one replica shard per primary shard, then you have four shards (2 primary + 2 replicas) and then the third node will certainly get at least one shard (primary or replica).

由于一幅图片价值一千个单词,因此下面的图片很好地说明了Elasticsearch的分布式性质.

Since a picture is worth thousand words, here is one that illustrates pretty well the distributed nature of Elasticsearch.

所以主要的收获是:

  • 索引被划分为一个或多个主碎片(=粗体绿色方块)
  • 主分片可以具有0+个副本分片(=虚线绿色方块)
  • 索引的所有主碎片和副本碎片都属于该索引(= logstash-* 列)
  • 一个集群可以有多个数据节点(= elasticsearch-* 行)
  • 碎片(无论是主碎片还是副本碎片)分布在群集的所有数据节点(上图中的所有单元)中.另外值得注意的是,主节点及其副本分片永远不能位于同一节点上

这篇关于Elasticsearch,当文档存储时,它会分成不同的碎片吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆