Elasticsearch基于公共字段合并多个索引 [英] Elasticsearch merge multiple indexes based on common field

查看:412
本文介绍了Elasticsearch基于公共字段合并多个索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用ELK从两个不同的数据库的数据中生成视图.一种是mysql,另一种是PostgreSQL.在这两个数据库实例之间无法编写联接查询.但是我有一个通用的现场呼叫"nic".以下是每个索引的文档.

I'm using ELK to generate views out of the data from two different DB. One is mysql other one is PostgreSQL. There is no way of writing join query between those two DB instance. But I have a common field call "nic". Following are the documents from each index.

MySQL

索引:user_detail

index: user_detail

"_id": "871123365V",
"_source": {
    "type": "db-poc-user",
    "fname": "Iraj",
    "@version": "1",
    "field_lname": "Sanjeewa",
    "nic": "871456365V",
    "@timestamp": "2020-07-22T04:12:00.376Z",
    "id": 2,
    "lname": "Santhosh"
  }

PostgreSQL

索引:track_details

Index: track_details

"_id": "871456365V",
"_source": {
   "@version": "1",
   "nic": "871456365V",
   "@timestamp": "2020-07-22T04:12:00.213Z",
   "track": "ELK",
   "type": "db-poc-ceg"
},

我想使用公共字段"nic"将两个索引合并为单个索引.并创建新索引.因此,我可以在Kibana上创建可视化效果.如何实现?

I want to merge both index in to single index using common field "nic". And create new index. So I can create visualization on Kibana. How can this be achieved?

请注意,新索引中的每个文档都应具有 "nic,fname,lname,track"作为字段.不是聚合.

Please note that each document in new index should have "nic,fname,lname,track" as fields. Not the aggregation.

推荐答案

我会利用

I would leverage the enrich processor to achieve this.

首先,您需要创建一个扩展策略(使用最小的索引,假设它是user_detail):

First, you need to create an enrich policy (use the smallest index, let's say it's user_detail):

PUT /_enrich/policy/user-policy
{
  "match": {
    "indices": "user_detail",
    "match_field": "nic",
    "enrich_fields": ["fname", "lname"]
  }
}

然后,您可以执行该策略以创建充实索引

Then you can execute that policy in order to create an enrichment index

POST /_enrich/policy/user-policy/_execute

下一步需要您创建使用上述丰富政策/索引的提取管道:

The next step requires you to create an ingest pipeline that uses the above enrich policy/index:

PUT /_ingest/pipeline/user_lookup
{
  "description" : "Enriching user details with tracks",
  "processors" : [
    {
      "enrich" : {
        "policy_name": "user-policy",
        "field" : "nic",
        "target_field": "tmp",
        "max_matches": "1"
      }
    },
    {
      "script": {
        "if": "ctx.tmp != null",
        "source": "ctx.putAll(ctx.tmp); ctx.remove('tmp');"
      }
    },
    {
      "remove": {
        "field": ["@version", "@timestamp", "type"]
      }
    }
  ]
}

最后,您现在可以使用连接的数据创建目标索引了.只需将_reindex API与我们刚刚创建的摄取管道结合使用:

Finally, you're now ready to create your target index with the joined data. Simply leverage the _reindex API combined with the ingest pipeline we've just created:

POST _reindex
{
  "source": {
    "index": "track_details"
  },
  "dest": {
    "index": "user_tracks",
    "pipeline": "user_lookup"
  }
}

运行此命令后,user_tracks索引将完全包含您所需的内容,例如:

After running this, the user_tracks index will contain exactly what you need, for instance:

  {
    "_index" : "user_tracks",
    "_type" : "_doc",
    "_id" : "0uA8dXMBU9tMsBeoajlw",
    "_score" : 1.0,
    "_source" : {
      "fname" : "Iraj",
      "nic" : "871456365V",
      "lname" : "Santhosh",
      "track" : "ELK"
    }
  }

如果您的源索引发生了变化(新用户,更改的名称等),则需要重新运行上述步骤,但是在执行此操作之前,您需要删除接收管道和接收策略(在这种情况下,订单):

If your source indexes ever change (new users, changed names, etc), you'll need to re-run the above steps, but before doing it, you need to delete the ingest pipeline and the ingest policy (in that order):

DELETE /_ingest/pipeline/user_lookup
DELETE /_enrich/policy/user-policy

之后,您可以自由地重新运行上述步骤.

After that you can freely re-run the above steps.

PS:请注意,由于user_detail中的记录在您的示例中没有相同的nic,所以我作弊了,但我想这是一个复制/粘贴问题.

PS: Just note that I cheated a bit since the record in user_detail doesn't have the same nic in your example, but I guess it was a copy/paste issue.

这篇关于Elasticsearch基于公共字段合并多个索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆