Elasticsearch增量快照如何处理已删除的文档? [英] How does Elasticsearch incremental snapshots deal with the deleted docs?
问题描述
我定期在s3存储桶中为ES群集拍摄快照,因此我想知道,如果我要从群集中删除旧文档并定期添加新文档,那么在拍摄快照后ES如何处理这种情况此外,文档也会从以前的快照中删除,或者ES如何保留我的文档的备份.请解释吗?
I regularly take a snapshot of my ES cluster in a s3 bucket and so i wanted to know that if i am deleting my old docs from the cluster and regularly adding new docs then after taking a snapshot how does ES deal with this scenario wheather the docs get deleted from my previous snapshots as well or how does ES keep a backup of my docs. Please explain?
推荐答案
ES拍摄快照时,ES不会拍摄 docs
的快照,而是拍摄 segments
.当然,这些细分包含文档.
When ES takes snapshots, ES doesn't take snapshots of docs
but rather it takes snapshots of segments
. Of course, the segments contain the docs.
要了解增量的概念,我们来看下面的示例.
To understand the concept of incremental, let's take the below example.
假设存在一个名为 my_index
的索引,该索引具有 1个主分片(分片0)
.当数据写入索引时,它将为分片创建段文件.
Say there's an index called my_index
with 1 primary shard (shard 0)
. As data gets written to the index, it will create segment file(s) for the shards.
最初,索引my_index可能类似于:
Initially, the index my_index may look like:
"my_index"
"consists of shard 0"
"shard 0 consists of segements A,B,C"
您在时间T1取得了索引my_index的快照S1.
You take Snapshot S1 of index my_index at time T1.
快照S1包含以下元数据:
The snapshot S1 contains the following metadata:
Index: my_index
Shards: 0
Segments: A,B,C
And then it will copy the segment files.
现在,您可以索引更多数据.ES将业务段B和C合并到新的段D中,并为新数据添加新的段E.合并片段后,将从片段中删除旧片段.同样,删除文档时,会发生段合并
Now, you index more data. ES merges segements B and C into a new segment D and adds new segment E for new data. Once segments are merged, the old segments are deleted from the shard. Same way, when documents are deleted, segment merging happens
现在索引my_index的分片0包含段A,D,E
Now the shard 0 of index my_index contains segments A,D,E
您在时间T2获取索引my_index的快照S2.S2将检查以查看需要什么文件.
You take Snapshot S2 of index my_index at time T2. S2 will check to see what files it will need.
It will NOT copy segment A (because it already exists in the repo - this is what is meant by incremental).
It will copy segment D
it will copy segment E
快照S2包含以下元数据:
The snapshot S2 contains the following metadata:
Index: my_index Shards: 0 Segments: A,D,E
这里的增量是什么?增量性质用于新段文件,而不必用于新数据.对于快照S2,未复制段A,因为它已包含在S1中.
What is incremental here? The incremental nature is for new segment files not necessarily for new data. For Snapshot S2, segment A was NOT copied because it was already contained in S1.
删除快照S1的时间是什么时候?
1.段B和C将被删除,因为它们不再被引用
2.排除段A,因为它已被快照S2引用
When happens when you delete Snapshot S1?
1. Segments B and C will be deleted since they are no longer being referenced
2. Exclude Segment A since it's being referenced by Snapshot S2
何时删除索引my_index?
快照仍将包含与my_index有关的段文件,使您可以随时恢复索引.
When happens when you delete index my_index?
The snapshots will still contain segement files pertaining to my_index allowing you to recover the index anytime.
删除文档会怎样?删除文档后,最终将段文件合并,创建新的段.因此,在删除文档后拍摄快照时,快照将没有文档.
What happens when documents are deleted? When docs are deleted, eventually the segment files are merged, new segments are created. So when you take a snapshot after document has been deleted, the snapshot will not have the document.
希望这会有所帮助
这篇关于Elasticsearch增量快照如何处理已删除的文档?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!