Elasticsearch-用逗号分割-分割过滤器Logstash [英] Elasticsearch - split by comma - split filter Logstash
问题描述
我有一个字段,其中的值是动态的.我想将空格分隔的令牌存储在完成建议程序
I have a field where values are dynamic. I want to store space separated tokens in an array field for completion suggester
如果我的字段 val
是 hi您好
,那么我想用组成一个数组[您好,您好吗?你,你]
Let's say if my field val
is hi how are you
then I want to have an array with [hi how are you, how are you, are you, you]
我尝试使用分离过滤器
作为 csv
中的数据.我做不到.无论如何,仅使用ES Logstash即可做到这一点.
I tried with split filter
as my data in csv
. I couldn't achieve that. Is there anyway to do this with only ES, Logstash.
推荐答案
基于我链接到的解决方案,您可以实现以下目标.
Based on the solution I linked to, you can achieve what you need as follows.
首先创建一个利用 script
处理器构建所需输入数组的摄取管道:
First create an ingest pipeline that leverages the script
processor to build the desired input array:
PUT _ingest/pipeline/csv-parser
{
"processors": [
{
"csv": {
"field": "message",
"target_fields": [
"val",
"val_type",
"id"
]
}
},
{
"script": {
"source": """
def tokens = new ArrayList(Arrays.asList(/\s+/.split(ctx.val)));
def nbTokens = tokens.size();
def input = [];
for (def i = nbTokens; i > 0; i--) {
input.add(tokens.join(" "));
tokens.remove(0);
}
ctx.val = [
'input': input,
'contexts': [
'type': [ctx.val_type]
]
]
"""
}
},
{
"remove": {
"field": "message"
}
}
]
}
然后您可以像这样对文档编制索引:
Then you can index documents like this:
PUT index/_doc/1?pipeline=csv-parser
{
"message": "hi how are you,seller,10223667"
}
生成的文档将如下所示:
And the resulting document will look like this:
GET index/_doc/1
->
{
"val" : {
"input" : [
"hi how are you",
"how are you",
"are you",
"you"
],
"contexts" : {
"type" : [
"seller"
]
}
},
"val_type" : "seller",
"id" : "10223667"
}
这篇关于Elasticsearch-用逗号分割-分割过滤器Logstash的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!