从postgres导入旧数据到elasticsearch [英] import old data from postgres to elasticsearch
问题描述
我的postgres数据库(远程)中有很多数据。这是过去1年的数据,现在我想将其推送到elasticsearch。
I have a lot of data in my postgres database( on a remote). This is the data of the past 1 year, and I want to push it to elasticsearch now.
数据中的时间字段格式为 2016-09-07 19:26:36.817039 + 00
。
The data has a time field in it in this format 2016-09-07 19:26:36.817039+00
.
我希望这是时间范围( @timestamp
)。这样我就可以在kibana中查看它,并查看去年的一些可视化效果。
I want this to be the timefield(@timestamp
) in elasticsearch. So that I can view it in kibana, and see some visualizations over the last year.
我需要如何有效地推送所有数据的帮助。我不能知道如何从Postgres获取所有数据。
I need help on how do I push all this data efficiently. I cannot get that how do I get all this data from postgres.
我知道我们可以通过jdbc插件注入数据,但是我想我无法创建 @timestamp
字段。
I know we can inject data via jdbc plugin, but I think I cannot create my @timestamp
field with that.
我也了解zombodb,但不确定是否也可以赋予我自己的时间范围。
I also know about zombodb but not sure if that also gives me feature to give my own timefield.
此外,数据量很大,因此我正在寻找有效的解决方案
Also, the data is in bulk, so I am looking for an efficient solution
我需要帮助我该怎么做。因此,欢迎提出建议。
I need help on how I can do this. So, suggestions are welcome.
推荐答案
我知道我们可以通过jdbc插件注入数据,但是我认为我不能用它创建我的
@timestamp
字段的
。
这应该对Logstash可行。第一个起点应该是此博客文章。请记住,Logstash始终由三部分组成:
This should be doable with Logstash. The first starting point should probably be this blog post. And remember that Logstash always consists of 3 parts:
- 输入: JDBC输入。如果只需要导入一次,请跳过
时间表
,否则以cron语法设置正确的时间。 - 过滤器:这不是博客文章的一部分。您将需要使用日期过滤器设置正确的
@timestamp
值-在末尾添加一个示例。 - 输出:这只是 Elasticsearch输出。
- Input: JDBC input. If you only need to import once, skip the
schedule
otherwise set the right timing in cron syntax. - Filter: This one is not part of the blog post. You will need to use the Date filter to set the right
@timestamp
value — adding an example at the end. - Output: This is simply the Elasticsearch output.
这将取决于PostgreSQL中时间戳值的格式和字段名称,但过滤器部分应如下所示:
This will depend on the format and field name of the timestamp value in PostgreSQL, but the filter part should look something like this:
date {
match => ["your_date_field", "dd-mm-YYYY HH:mm:ss"]
remove_field => "your_date_field" # Remove now redundant field, since we're storing it in @timestamp (the default target of date)
}
如果您担心性能:
- 您需要设置正确的
jdbc_fetch_size
。 - Elasticsearch输出为默认为分批。
- You will need to set the right
jdbc_fetch_size
. - Elasticsearch output is batched by default.
这篇关于从postgres导入旧数据到elasticsearch的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!