Elasticsearch复制其他系统数据? [英] Elasticsearch replication of other system data?

查看:25
本文介绍了Elasticsearch复制其他系统数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我想使用 elasticsearch 在网站上实现通用搜索.顶部的搜索栏应该可以在整个网站上找到所有不同类型的资源.确定的文件(通过 tika 上传/索引)以及客户、帐户、其他人等内容.

Suppose I want to use elasticsearch to implement a generic search on a website. The top search bar would be expected to find resources of all different kinds across the site. Documents for sure (uploaded/indexed via tika) but also things like clients, accounts, other people, etc.

出于架构原因,大多数非文档内容(客户、帐户)将存在于关系数据库中.

For architectural reasons, most of the non-document stuff (clients, accounts) will exist in a relational database.

在实现此搜索时,选项 #1 将创建所有内容的文档版本,然后仅使用 elasticsearch 运行搜索的所有方面,完全不依赖关系数据库来查找不同类型的对象.

When implementing this search, option #1 would be to create document versions of everything, and then just use elasticsearch to run all aspects of the search, relying not at all on the relational database for finding different types of objects.

选项#2 将仅使用 elasticsearch 来索引文档,这意味着对于一般的站点搜索"功能,您必须将多个搜索分配给多个系统,然后在返回结果之前聚合结果.

Option #2 would be to use elasticsearch only for indexing the documents, which would mean for a general "site search" feature, you'd have to farm out multiple searches to multiple systems, then aggregate the results before returning them.

选项#1 似乎要优越得多,但缺点是它要求弹性搜索本质上拥有生产关系数据库中大量内容的副本,而且这些副本随着事情的变化而保持最新.

Option #1 seems far superior, but the downside is that it requires that elastic search in essence have a copy of a great many things in the production relational database, plus that those copies be kept fresh as things change.

保持这些商店同步的最佳选择是什么,我是否认为对于一般搜索,选项 1 更胜一筹?有没有选项 #3?

What's the best option for keeping these stores in sync, and am I correct in thinking that for general search, option #1 is superior? Is there an option #3?

推荐答案

当涉及到跨多个数据存储进行搜索时,您几乎列出了两个主要选项,即在一个中央数据存储中搜索(选项 #1) 或在所有数据存储中搜索并汇总结果(选项 2).

You've pretty much listed the two main options there are when it comes to search across multiple data stores, i.e. search in one central data store (option #1) or search in all data stores and aggregate the results (option #2).

这两个选项都可以,尽管选项 #2 有两个主要缺点:

Both options would work, although option #2 has two main drawbacks:

  1. 需要在您的应用程序中开发大量逻辑,以便将搜索分支"到多个数据存储并聚合您返回的结果.
  2. 每个数据存储的响应时间可能不同,因此,您必须等待最慢的数据存储响应才能将搜索结果呈现给用户(除非您通过使用不同的异步技术来规避这一点,如 Ajax、websocket 等)

如果您想提供更好、更可靠的搜索体验,选项 #1 显然会得到我的投票(实际上我大部分时间都是这样做的).正如您所说的那样,此选项的主要缺点"是您需要使 Elasticsearch 与其他主数据存储中的更改保持同步.

If you want to provide a better and more reliable search experience, option #1 would clearly get my vote (I take this way most of the time actually). As you've correctly stated, the main "drawback" of this option is that you need to keep Elasticsearch in synch with the changes in your other master data stores.

由于您的其他数据存储将是关系数据库,您有几个不同的选项可以使它们与 Elasticsearch 保持同步,即:

Since your other data stores will be relational databases, you have a few different options to keep them in synch with Elasticsearch, namely:

前两个选项效果很好,但有一个主要缺点,即它们不会捕获表上的 DELETE,它们只会捕获 INSERT 和 UPDATE.这意味着,如果您曾经删除过用户、帐户等,您将无法知道必须在 Elasticsearch 中删除相应的文档.当然,除非您决定在每次导入会话之前删除 Elasticsearch 索引.

These first two options work great but have one main disadvantage, i.e. they don't capture DELETEs on your table, they will only capture INSERTs and UPDATEs. This means that if you ever delete a user, account, etc, you will not be able to know that you have to delete the corresponding document in Elasticsearch. Unless, of course, you decide to delete the Elasticsearch index before each import session.

为了缓解这种情况,您可以使用另一种基于 MySQL 二进制日志的工具,从而能够捕获每个事件.有一个写在 Go 中,一个写在 JavaPython.

To alleviate this, you can use another tool which bases itself on the MySQL binlog and will thus be able to capture every event. There's one written in Go, one in Java and one in Python.

更新:

这是关于该主题的另一篇有趣的博客文章:如何使用 Logstash 使 Elasticsearch 与关系数据库保持同步

Here is another interesting blog article on the subject: How to keep Elasticsearch synchronized with a relational database using Logstash

这篇关于Elasticsearch复制其他系统数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆