Elasticsearch Analyzer删除引用的句子 [英] Elasticsearch analyzer to remove quoted sentences

查看:92
本文介绍了Elasticsearch Analyzer删除引用的句子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建一个分析器,该分析器将删除(或替换为空白/空白)文档中带引号的句子。

I'm trying to create an analyzer that would remove (or replace by white/empty space) a quoted sentence within a document.

例如:这是我的测试文档

例如,我想将术语向量 : [这是我的]

I'd like, for example, the term vector to be: [this, is, my]

推荐答案

Daniel答案是正确的,但是由于缺少相应的正则表达式和替换,我提供了它,其中包括对文本的测试。

Daniel Answer is correct, but as corresponding regex and replacement are missing, I am providing it, which includes the test of your text.

索引设置如下,其中使用模式替换字符。

Index setting as below which uses pattern replace char.

{
    "settings": {
        "analysis": {
            "analyzer": {
                "my_analyzer": {
                    "tokenizer": "standard",
                    "char_filter": [
                        "my_char_filter"
                    ],
                    "filter": [
                        "lowercase"
                    ]
                }
            },
            "char_filter": {
                "my_char_filter": {
                    "type": "pattern_replace",
                    "pattern": "\"(.*?)\"",
                    "replacement": ""
                }
            }
        }
    }
}

之后,使用分析API ,它会在以下令牌中生成:

After that using analyze API it generates below tokens:

POST _an

{
    "text": "this is my \"test document\"",
    "analyzer" : "my_analyzer"
}

上述API的输出:

{
    "tokens": [
        {
            "token": "this",
            "start_offset": 0,
            "end_offset": 4,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "is",
            "start_offset": 5,
            "end_offset": 7,
            "type": "<ALPHANUM>",
            "position": 1
        },
        {
            "token": "my",
            "start_offset": 8,
            "end_offset": 10,
            "type": "<ALPHANUM>",
            "position": 2
        }
    ]
}

这篇关于Elasticsearch Analyzer删除引用的句子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆