将表从Postgres数据库snyc /导入表到elasticsearch的正确方法是什么? [英] What is the right way to snyc/import tables from a postgres DB to elasticsearch?

查看:108
本文介绍了将表从Postgres数据库snyc /导入表到elasticsearch的正确方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将一些表从Postgres数据库导入到Elastic搜索中,并且还要使表与弹性搜索中的数据保持同步。我看过关于udemy的课程,还与一位在此问题上有丰富经验的同事进行了交谈,以了解最好的方法是什么。我很惊讶地听到他们俩的来信,这似乎是最好的方法,就是用python,java或其他处理这种导入的语言编写代码并同步它,这使我想到了我的问题。这实际上是处理这种情况的最佳方法吗?似乎会有一个库,插件或某种可以处理将数据导入弹性搜索并使其与外部数据库保持同步的情况。解决这种情况的最佳方法是什么?

I want to import some tables from a postgres database into Elastic search and also hold the tables in sync with the data in elastic search. I have looked at a course on udemy, and also talked with a colleague who has a lot of experience with this issue to see what the best way to do it is. I am surprised to hear from both of them, it seems like the best way to do it, is to write code in python, java or some other language that handles this import and sync it which brings me to my question. Is this actually the best way to handle this situation? It seems like there would be a library, plugin, or something that would handle the situation of importing data into elastic search and holding it in sync with an external database. What is the best way to handle this situation?

推荐答案

这取决于您的用例。通常的做法是在应用程序层上进行处理。基本上,您要做的是将一个数据库的操作复制到另一个数据库。因此,例如,如果您在postgres中保存一个条目,则在elasticsearch中执行相同的操作。

It depends on your use case. A common practice is to handle this on the application layer. Basically what you do is to replicate the actions of one db to the other. So for example if you save one entry in postgres you do the same in elasticsearch.

但是,如果要执行此操作,则必须有一个排队系统。队列都集成在您的应用程序层中,例如如果elasticsearch中的保存失败,则可以重播该操作。此外,在您的排队系统上,您将实施限制机制,以免压倒弹性搜索。另一种方法是将事件发送到另一个应用程序(例如logstash等),因此节流和持久性将由该系统而不是您的应用程序来处理。

If you do this however you'll have to have a queuing system in place. Either the queue is integrated on your application layer, e.g. if the save in elasticsearch fails then you can replay the operation. Moreover on your queuing system you'll implement a throttling mechanism in order to not overwhelm elasticsearch. Another approach would be to send events to another app (e.g. logstash etc), so the throttling and persistence will be handled by that system and not your application.

另一种方法是是这个 https://www.elastic.co/blog/logstash-jdbc-输入插件。您使用另一个系统来轮询数据库并将更改发送到elasticsearch。在这种情况下,logstash是理想的选择,因为它是ELK堆栈的一部分,并且具有很好的集成性。也是请 https://www.elastic.co /guide/zh-CN/logstash/current/plugins-inputs-jdbc.html

Another approach would be this https://www.elastic.co/blog/logstash-jdbc-input-plugin. You use another system that "polls" your database and sends the changes to elasticsearch. In this case logstash is ideal since it's part of the ELK stack and it has a great integration. Check this too https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html

另一种方法是使用 NOTIFY 机制的postgres将事件发送到某个队列,该队列将处理在elasticsearch中保存的更改。

Another approach is to use the NOTIFY mechanism of postgres to send events to some queue that will handle saving the changes in elasticsearch.

这篇关于将表从Postgres数据库snyc /导入表到elasticsearch的正确方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆