从postgres导入旧数据到elasticsearch [英] import old data from postgres to elasticsearch

查看:90
本文介绍了从postgres导入旧数据到elasticsearch的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的postgres数据库(远程)中有很多数据。这是过去1年的数据,现在我想将其推送到elasticsearch。

I have a lot of data in my postgres database( on a remote). This is the data of the past 1 year, and I want to push it to elasticsearch now.

数据中的时间字段格式为 2016-09-07 19:26:36.817039 + 00

The data has a time field in it in this format 2016-09-07 19:26:36.817039+00.

我希望这是时间范围( @timestamp )。这样我就可以在kibana中查看它,并查看去年的一些可视化效果。

I want this to be the timefield(@timestamp) in elasticsearch. So that I can view it in kibana, and see some visualizations over the last year.

我需要如何有效地推送所有数据的帮助。我不能知道如何从Postgres获取所有数据。

I need help on how do I push all this data efficiently. I cannot get that how do I get all this data from postgres.

我知道我们可以通过jdbc插件注入数据,但是我想我无法创建 @timestamp 字段。

I know we can inject data via jdbc plugin, but I think I cannot create my @timestamp field with that.

我也了解zombodb,但不确定是否也可以赋予我自己的时间范围。

I also know about zombodb but not sure if that also gives me feature to give my own timefield.

此外,数据量很大,因此我正在寻找有效的解决方案

Also, the data is in bulk, so I am looking for an efficient solution

我需要帮助我该怎么做。因此,欢迎提出建议。

I need help on how I can do this. So, suggestions are welcome.

推荐答案


我知道我们可以通过jdbc插件注入数据,但是我认为我不能用它创建我的 @timestamp 字段的

这应该对Logstash可行。第一个起点应该是此博客文章。请记住,Logstash始终由三部分组成:

This should be doable with Logstash. The first starting point should probably be this blog post. And remember that Logstash always consists of 3 parts:


  1. 输入: JDBC输入。如果只需要导入一次,请跳过时间表,否则以cron语法设置正确的时间。

  2. 过滤器:这不是博客文章的一部分。您将需要使用日期过滤器设置正确的 @timestamp 值-在末尾添加一个示例。

  3. 输出:这只是 Elasticsearch输出

  1. Input: JDBC input. If you only need to import once, skip the schedule otherwise set the right timing in cron syntax.
  2. Filter: This one is not part of the blog post. You will need to use the Date filter to set the right @timestamp value — adding an example at the end.
  3. Output: This is simply the Elasticsearch output.

这将取决于PostgreSQL中时间戳值的格式和字段名称,但过滤器部分应如下所示:

This will depend on the format and field name of the timestamp value in PostgreSQL, but the filter part should look something like this:

date {
   match => ["your_date_field", "dd-mm-YYYY HH:mm:ss"]
   remove_field => "your_date_field" # Remove now redundant field, since we're storing it in @timestamp (the default target of date)
}

如果您担心性能:

  • You will need to set the right jdbc_fetch_size.
  • Elasticsearch output is batched by default.

这篇关于从postgres导入旧数据到elasticsearch的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆