Elasticsearch CRUD是否需要刷新? [英] Do Elasticsearch CRUD need refresh?

查看:72
本文介绍了Elasticsearch CRUD是否需要刷新?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将RDBS数据与Elasticsearch同步。实现此目的的常用方法是在RDBS上应用更改,然后使用消息队列(或用于ETL的表)在ES上应用相同的更改。

I need to sync a RDBS data with Elasticsearch. The common approach to achieve this is applying changes on the RDBS and then to use a message queue (or a table used to ETL) to apply same changes on ES.

同一Elasticsearch博客建议从队列中弹出1000条消息,并将其推送到带有插入,更新和删除的批量请求中。

The same Elasticsearch blog suggests to pop 1000 messages from the queue and push them in a bulk request with inserts, updates and deletes.

众所周知,ES是NEAR实时的,在进行更改之前需要刷新

It’s known that ES is NEAR real-time and a refresh is needed before changes would be visible to search requests.

鉴于此事实,问题是:使用 EXPLICIT ID (获取,插入,更新)进行 CRUD 操作,DELETE)是否需要刷新(如果在行中执行)?换句话说:CRUD是实时的吗?

Given this fact the question is: Do CRUD operation with an EXPLICIT ID (GET, INSERT, UPDATE, DELETE) need a refresh if performed in row? In other words: are CRUDs in row REAL-TIME?

通过阅读几篇文章,看起来它们不需要刷新并且可以实时应用,但是我想获得确认。

By reading few articles it looks like they don’t need a refresh and they are applied in real time, but I would like to get a confirm.

要更清楚:我不需要需要执行搜索请求(需要刷新才能进行更改可见),但仅使用显式ID访问。我不介意这些更改何时对搜索可见。

To be more clear: I don’t need to perform search requests (where refresh is needed to make changes visible), but just using explicit ID access. I don’t mind when these changes will become visible to searches.

如果在ES上连续执行了两个CRUD请求:

If two CRUD requests are performed in row on ES:


  1. id = 1的INDEX文档

  1. INDEX document with id=1

具有id的更新(或删除)文档= 1

UPDATE (or DELETE) document with id=1

2)是否需要等待刷新才能看到1)?

Does the 2) need to wait a refresh to see 1)?

如果是,我没有找到一种方法来实现RDBS与ES之间的一致性,因为同一行中的操作最终将导致RDBS上的文档被更新(或删除),但是由于缺少以下内容而无法在ES上运行

If yes I don’t find a way to achieve consistency between a RDBS and ES because same operations in row would end up with an updated (or deleted) document on the RDBS, but would fail on ES due to lack of refresh.

推荐答案

简短答案:

您不需要刷新。这将是一致的,意味着操作将按顺序执行。 ES确保总是最新请求成功。并使更改在每个 index / update / delete 请求中保持不变。

You don't need a refresh. It will be consistent means operations are executed in order. ES makes sure always latest request succeeds. And it makes the changes persistent every index/update/delete request.

以防在不同的网络上收到两个写请求对ID进行分区,随后的第一个将成功,然后将不会更新较早的一个,因为通过版本控制实现了一致性。最新版本的数据总是成功。

In case, there are two write requests received at different network partition for an ID and later one succeeds first, then earlier one will not be updated as consistency is achieved by versioning. Latest version data always succeeds.

长答案:

您需要研究很多概念,例如 translog fsync ES的一致性,'乐观并发控制',版本控制,分区,可用性

You need to look at many concepts like translog, fsync, consistency at ES, 'optimistic concurrency control', versioning, partitioning, availability.

ES使用版本控制实现一致性。因此,当您发送 index / update / delete 请求时,它会在较高级别执行以下操作。

ES achieves consistency using versioning. So when you sent index/update/delete requests it does the following things at high level.


  1. 将其写入事务日志

  2. 使其具有持久性-有默认的interval属性。当间隔时间过去后或每次 index / delete / update 操作

  3. 将请求发送到节点

  4. 收到请求的节点将标识数据所属分区的领导者。

  5. Partition-leader-node将数据写入并转发到其他副本节点。应该复制分区。

  6. 确认所有内容后,通过请求的初始节点将状态返回给客户端。

  1. Writes it to translog
  2. Makes it persistent - there is a default interval property. When that interval elapses or after every index/delete/update operation
  3. Sends the request to the node
  4. The node which received the request identifies the leader of the partition where the data belongs to.
  5. Partition-leader-node writes the data and forwards to other replica-nodes where this partition should be replicated.
  6. Once all are acknowledged, return the status to the client via the initial node-which-received-the-request.

要使它成为功能强大的分布式系统,有很多概念/算法。

There are many concepts/algorithms in this to make it powerful distributed system.

这篇关于Elasticsearch CRUD是否需要刷新?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆