在弹性搜索中索引包含数学表达式的文档的最佳方法是什么? [英] What is the best way to index documents which contain mathematical expression in elastic search?

查看:87
本文介绍了在弹性搜索中索引包含数学表达式的文档的最佳方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要解决的问题是,我有一堆有关数学表达式/公式的文档.我想按公式或表达式搜索文档.

The problem here I am trying to solve is I have a bunch of documents which context mathematical expressions/formulas. I want to search the documents by the formula or expression.

到目前为止,根据我的研究,我正在考虑将数学表达式转换为乳胶格式,并作为字符串存储在数据库中(弹性搜索).

So far based on my research I'm considering to convert the mathematical expression to latex format and store as a string in the database (elastic search).

通过这种方法,我可以搜索带有乳胶字符串的文档吗?

With this approach will be I able to search for documents with the latex string?

a2 + b2 = c2的示例乳胶转换为a ^ {2} + b ^ {2} = c ^ {2}.可以在弹性搜索中搜索此字符串吗?

Example latex conversion of a2 + b2 = c2 is a^{2} + b^{2} = c^{2} . Can this string be searchable in elastic search ?

推荐答案

我同意用户@Lue E进行更多修改,并尝试使用简单的关键字方法,但给了我一些问题,因此我修改了使用<我自己的自定义分析器中的标记生成器应该可以解决您的大多数用例.

I agree with user @Lue E with some more modifications and tried with a simple keyword approach but gave me some issues, hence I modified my approach to using the keyword tokenizer in my own custom analyzer which should solve most of your use-cases.

{
    "settings": {
        "analysis": {
            "analyzer": {
                "my_custom_analyzer": {
                    "type": "custom",
                    "tokenizer": "keyword", --> to make it searchable
                    "filter": [
                        "lowercase", --> case insensitive search
                        "trim" --> remove extra spaces
                    ]
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "mathformula": {
                "type": "text",
                "analyzer": "my_custom_analyzer"
            }
        }
    }
}

索引样本文档

 {
        "mathformula" : "(a+b)^2 = a^2 + b^2 + 2ab"
    }

{
    "mathformula" : "a2+b2 = c2"
}

搜索查询(匹配查询,使用与索引时间相同的分析器)

{
    "query": {
        "match" : {
            "mathformula" : {
                "query" : "a2+b2 = c2"
            }
        }
    }
}

搜索结果仅包含第一个索引文档

 "hits": [
            {
                "_index": "so_math",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.6931471,
                "_source": {
                    "mathformula": "a2+b2 = c2"
                }
            }
        ]

这篇关于在弹性搜索中索引包含数学表达式的文档的最佳方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆