Elasticsearch-用逗号分割-分割过滤器Logstash [英] Elasticsearch - split by comma - split filter Logstash

查看:118
本文介绍了Elasticsearch-用逗号分割-分割过滤器Logstash的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个字段,其中的值是动态的.我想将空格分隔的令牌存储在完成建议程序

I have a field where values are dynamic. I want to store space separated tokens in an array field for completion suggester

如果我的字段 val hi您好,那么我想用组成一个数组[您好,您好吗?你,你]

Let's say if my field val is hi how are you then I want to have an array with [hi how are you, how are you, are you, you]

我尝试使用分离过滤器作为 csv 中的数据.我做不到.无论如何,仅使用ES Logstash即可做到这一点.

I tried with split filter as my data in csv. I couldn't achieve that. Is there anyway to do this with only ES, Logstash.

推荐答案

基于我链接到的解决方案,您可以实现以下目标.

Based on the solution I linked to, you can achieve what you need as follows.

首先创建一个利用 script 处理器构建所需输入数组的摄取管道:

First create an ingest pipeline that leverages the script processor to build the desired input array:

PUT _ingest/pipeline/csv-parser
{
  "processors": [
    {
      "csv": {
        "field": "message",
        "target_fields": [
          "val",
          "val_type",
          "id"
        ]
      }
    },
    {
      "script": {
        "source": """
          def tokens = new ArrayList(Arrays.asList(/\s+/.split(ctx.val)));
          def nbTokens = tokens.size();
          def input = [];
          for (def i = nbTokens; i > 0; i--) {
            input.add(tokens.join(" "));
            tokens.remove(0);
          }

          ctx.val = [
            'input': input,
            'contexts': [
              'type': [ctx.val_type]
            ]
          ]
          """
      }
    },
    {
      "remove": {
        "field": "message"
      }
    }
  ]
}

然后您可以像这样对文档编制索引:

Then you can index documents like this:

PUT index/_doc/1?pipeline=csv-parser
{
  "message": "hi how are you,seller,10223667"
}

生成的文档将如下所示:

And the resulting document will look like this:

GET index/_doc/1
->
{
    "val" : {
      "input" : [
        "hi how are you",
        "how are you",
        "are you",
        "you"
      ],
      "contexts" : {
        "type" : [
          "seller"
        ]
      }
    },
    "val_type" : "seller",
    "id" : "10223667"
}

这篇关于Elasticsearch-用逗号分割-分割过滤器Logstash的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆