弹性搜索复制其他系统数据? [英] Elasticsearch replication of other system data?

查看:172
本文介绍了弹性搜索复制其他系统数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我想使用弹性搜索来在网站上实现一般的搜索。预计顶级搜索栏将在整个网站上查找所有不同类型的资源。文件肯定(通过tika上传/索引),而且还包括客户端,帐户,其他人等等。



出于架构原因,大部分非文档(客户端,帐户)将存在于关系数据库中。



在实现此搜索时,选项#1将是创建所有内容的文档版本,然后使用弹性搜索运行搜索的各个方面,不依赖于关系数据库来查找不同类型的对象。



选项#2将仅使用弹性搜索来索引文档,这意味着一般的站点搜索功能,您必须将多个搜索纳入多个系统,然后在返回结果之前汇总结果。



选项# 1似乎远远优越,但缺点是它要求弹性搜索本质上在生产关系数据库中有很多东西的副本,加上那些副本保持新鲜,因为事物ch ange。



保持这些商店是同步的最佳选择是什么,我认为普通搜索的选择是否正确?有没有选项#3?

解决方案

你几乎列出了两个主要选项,当涉及到搜索多个数据存储,即在一个中央数据存储(选项#1)中搜索或搜索所有数据存储并合并结果(选项#2)。



工作虽然选项#2有两个主要的缺点:


  1. 在应用程序中需要开发大量的逻辑,以便分支搜索到多个数据存储,并将您收到的结果汇总。

  2. 每个数据存储的响应时间可能不同,因此,您将不得不等待为了向用户提供搜索结果(除非您通过使用不同的异步技术(例如Ajax,websocket等)来规避),响应速度最慢的数据存储

如果您想提供更好更可靠的搜索体验ence,选项#1将显然得到我的投票(我大部分时间实际上这样做)。正如您已经正确说明的那样,此选项的主要缺点就是您需要使Elasticsearch与其他主数据存储中的更改保持同步。



由于您的其他数据存储将是关系数据库,您有几个不同的选项可以使它们与Elasticsearch保持同步,即:




  • 使用< a href =https://www.elastic.co/blog/logstash-jdbc-input-plugin =noreferrer> Logstash JDBC输入

  • 使用 JDBC进口商工具



这两个前两个选项的效果非常好,但有一个主要缺点,即它们不会捕获表上的DELETE,它们只能捕获INSERT和UPDATE。这意味着如果您删除用户,帐户等,您将无法知道您必须在Elasticsearch中删除相应的文档。除非您决定在每次导入会话之前删除Elasticsearch索引。



为了减轻这种情况,您可以使用另一种基于MySQL binlog的工具,因此能够捕获每个事件。有一个在 Go 中编写,一个在 Java ,一个在 Python


Suppose I want to use elasticsearch to implement a generic search on a website. The top search bar would be expected to find resources of all different kinds across the site. Documents for sure (uploaded/indexed via tika) but also things like clients, accounts, other people, etc.

For architectural reasons, most of the non-document stuff (clients, accounts) will exist in a relational database.

When implementing this search, option #1 would be to create document versions of everything, and then just use elasticsearch to run all aspects of the search, relying not at all on the relational database for finding different types of objects.

Option #2 would be to use elasticsearch only for indexing the documents, which would mean for a general "site search" feature, you'd have to farm out multiple searches to multiple systems, then aggregate the results before returning them.

Option #1 seems far superior, but the downside is that it requires that elastic search in essence have a copy of a great many things in the production relational database, plus that those copies be kept fresh as things change.

What's the best option for keeping these stores in sync, and am I correct in thinking that for general search, option #1 is superior? Is there an option #3?

解决方案

You've pretty much listed the two main options there are when it comes to search across multiple data stores, i.e. search in one central data store (option #1) or search in all data stores and aggregate the results (option #2).

Both options would work, although option #2 has two main drawbacks:

  1. It will require a substantial amount of logic to be developed in your application in order to "branch out" the searches to the multiple data stores and aggregate the results you get back.
  2. The response times might be different for each data store, and thus, you will have to wait for the slowest data store to respond in order to present the search results to the user (unless you circumvent this by using different asynchronous technologies, such as Ajax, websocket, etc)

If you want to provide a better and more reliable search experience, option #1 would clearly get my vote (I take this way most of the time actually). As you've correctly stated, the main "drawback" of this option is that you need to keep Elasticsearch in synch with the changes in your other master data stores.

Since your other data stores will be relational databases, you have a few different options to keep them in synch with Elasticsearch, namely:

These first two options work great but have one main disadvantage, i.e. they don't capture DELETEs on your table, they will only capture INSERTs and UPDATEs. This means that if you ever delete a user, account, etc, you will not be able to know that you have to delete the corresponding document in Elasticsearch. Unless, of course, you decide to delete the Elasticsearch index before each import session.

To alleviate this, you can use another tool which bases itself on the MySQL binlog and will thus be able to capture every event. There's one written in Go, one in Java and one in Python.

这篇关于弹性搜索复制其他系统数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆