Elasticsearch Analyzer删除引用的句子 [英] Elasticsearch analyzer to remove quoted sentences
本文介绍了Elasticsearch Analyzer删除引用的句子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试创建一个分析器,该分析器将删除(或替换为空白/空白)文档中带引号的句子。
I'm trying to create an analyzer that would remove (or replace by white/empty space) a quoted sentence within a document.
例如:这是我的测试文档
例如,我想将术语向量 : [这是我的]
I'd like, for example, the term vector to be: [this, is, my]
推荐答案
Daniel答案是正确的,但是由于缺少相应的正则表达式和替换,我提供了它,其中包括对文本的测试。
Daniel Answer is correct, but as corresponding regex and replacement are missing, I am providing it, which includes the test of your text.
索引设置如下,其中使用模式替换字符。
Index setting as below which uses pattern replace char.
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"char_filter": [
"my_char_filter"
],
"filter": [
"lowercase"
]
}
},
"char_filter": {
"my_char_filter": {
"type": "pattern_replace",
"pattern": "\"(.*?)\"",
"replacement": ""
}
}
}
}
}
之后,使用分析API ,它会在以下令牌中生成:
After that using analyze API it generates below tokens:
POST _an
{
"text": "this is my \"test document\"",
"analyzer" : "my_analyzer"
}
上述API的输出:
{
"tokens": [
{
"token": "this",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "is",
"start_offset": 5,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "my",
"start_offset": 8,
"end_offset": 10,
"type": "<ALPHANUM>",
"position": 2
}
]
}
这篇关于Elasticsearch Analyzer删除引用的句子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文