Elasticsearch增量快照如何处理已删除的文档? [英] How does Elasticsearch incremental snapshots deal with the deleted docs?

查看:97
本文介绍了Elasticsearch增量快照如何处理已删除的文档?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我定期在s3存储桶中为ES群集拍摄快照,因此我想知道,如果我要从群集中删除旧文档并定期添加新文档,那么在拍摄快照后ES如何处理这种情况此外,文档也会从以前的快照中删除,或者ES如何保留我的文档的备份.请解释吗?

I regularly take a snapshot of my ES cluster in a s3 bucket and so i wanted to know that if i am deleting my old docs from the cluster and regularly adding new docs then after taking a snapshot how does ES deal with this scenario wheather the docs get deleted from my previous snapshots as well or how does ES keep a backup of my docs. Please explain?

推荐答案

ES拍摄快照时,ES不会拍摄 docs 的快照,而是拍摄 segments .当然,这些细分包含文档.

When ES takes snapshots, ES doesn't take snapshots of docs but rather it takes snapshots of segments. Of course, the segments contain the docs.

要了解增量的概念,我们来看下面的示例.

To understand the concept of incremental, let's take the below example.

假设存在一个名为 my_index 的索引,该索引具有 1个主分片(分片0).当数据写入索引时,它将为分片创建段文件.

Say there's an index called my_index with 1 primary shard (shard 0). As data gets written to the index, it will create segment file(s) for the shards.

最初,索引my_index可能类似于:

Initially, the index my_index may look like:

"my_index"
"consists of shard 0"
"shard 0 consists of segements A,B,C"

您在时间T1取得了索引my_index的快照S1.

You take Snapshot S1 of index my_index at time T1.

快照S1包含以下元数据:

The snapshot S1 contains the following metadata:

Index: my_index
Shards: 0
Segments: A,B,C
And then it will copy the segment files.

现在,您可以索引更多数据.ES将业务段B和C合并到新的段D中,并为新数据添加新的段E.合并片段后,将从片段中删除旧片段.同样,删除文档时,会发生段合并

Now, you index more data. ES merges segements B and C into a new segment D and adds new segment E for new data. Once segments are merged, the old segments are deleted from the shard. Same way, when documents are deleted, segment merging happens

现在索引my_index的分片0包含段A,D,E

Now the shard 0 of index my_index contains segments A,D,E

您在时间T2获取索引my_index的快照S2.S2将检查以查看需要什么文件.

You take Snapshot S2 of index my_index at time T2. S2 will check to see what files it will need.

It will NOT copy segment A (because it already exists in the repo - this is what is meant by incremental). 
It will copy segment D
it will copy segment E

快照S2包含以下元数据:

The snapshot S2 contains the following metadata:

Index: my_index Shards: 0 Segments: A,D,E

这里的增量是什么?增量性质用于新段文件,而不必用于新数据.对于快照S2,未复制段A,因为它已包含在S1中.

What is incremental here? The incremental nature is for new segment files not necessarily for new data. For Snapshot S2, segment A was NOT copied because it was already contained in S1.

删除快照S1的时间是什么时候?
1.段B和C将被删除,因为它们不再被引用
2.排除段A,因为它已被快照S2引用

When happens when you delete Snapshot S1?
1. Segments B and C will be deleted since they are no longer being referenced
2. Exclude Segment A since it's being referenced by Snapshot S2

何时删除索引my_index?
快照仍将包含与my_index有关的段文件,使您可以随时恢复索引.

When happens when you delete index my_index?
The snapshots will still contain segement files pertaining to my_index allowing you to recover the index anytime.

删除文档会怎样?删除文档后,最终将段文件合并,创建新的段.因此,在删除文档后拍摄快照时,快照将没有文档.

What happens when documents are deleted? When docs are deleted, eventually the segment files are merged, new segments are created. So when you take a snapshot after document has been deleted, the snapshot will not have the document.

希望这会有所帮助

这篇关于Elasticsearch增量快照如何处理已删除的文档?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆