在弹性搜索中快速提取关键词 [英] Fast keyword extraction in elasticsearch

查看:159
本文介绍了在弹性搜索中快速提取关键词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大型的数据库,存储在弹性搜索数据库中的图像注释。我想使用这个数据库进行关键字提取。输入是文字(通常是报纸文章)。我的一个算法的基本思想是从文章中通过每个术语,并使用弹性搜索来发现该术语在图像注释中的频率。然后从不频繁的文章中输出条款(为了更喜欢人或地方的名字与普通英文单词的名称)。

I have large database of annotations of images stored in an elasticsearch database. I want to use this database for keyword extraction. Input is text (typically a newspaper article). My basic idea for an algorithm is to go through each term from the article and use elasticsearch to discover how frequent the term is in the image annotations. Then output terms from articles which are not frequent (in order to prefer names of people or places over common English words).

我不需要非常复杂的东西,关键字只是用于用户输入的建议,但是我想要更快的东西,然后询问N个搜索查询(其中N是文本中的术语数)到弹性搜索,这在大文本上可能很慢。在弹性搜索中有一些鲁棒快捷的关键词提取技术吗?

I don't need something very sophisticated, these keywords are used just as suggestion for user input, but I want something faster then asking N search queries (where N is number of terms in text) to elasticsearch which can be slow on large texts. Is there some robust and fast technique for keyword extraction in elasticsearch?

推荐答案

可以使用弹性搜索术语汇总。他们可以返回带有文档计数的带有桶的关键字,表示其相对频率。这是YML中的一个示例查询。

You can use elastic search term aggregations for this. They can return bucketed keywords with document counts which indicate their relative frequency. Here is an example query in YML.

query:
    match:
        annotation:
            query: text of your article
aggregations:
    term_frequencies:
        terms:
            field: annotation

这篇关于在弹性搜索中快速提取关键词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆