获取文本字段中最常用的10个单词 [英] Get top 10 most used words in text fields

查看:86
本文介绍了获取文本字段中最常用的10个单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含数千个文档的索引,每个文档都有一个全文本字段。

I have an index containing thousands of documents, each one of them having a full text field.

我想搜索所有这些字段并获取10个最

I want to search through all those fields and fetch the 10 most common words that come back most often.

如果可能的话,我也想在Kibana上可视化它。

I would also like a way of visualizing it on Kibana if that's possible.

推荐答案

最常见的实现方法是使用关键字数据类型复制全文字段。这将使您能够在该字段上进行术语汇总-此处的文档。也许您可以考虑进行重要术语汇总-此处的文档,从而避免出现停用词和常见词。在ES 6.x中,您还可以使用重要文本聚合-此处的文档,而不创建关键字字段,但我从不试试吧,我不知道它是如何工作的。相反,如果需要检索每个文档的单词出现频率,则应使用 termvector -在此下载文件

The most common way to achieve that is to duplicate your full text field with a keyword datatype. That will get you able to make terms aggregation on that field - doc here. Maybe you could consider to do a significant term aggregation - doc here, thus to avoid the presence of stopwords and common words. In ES 6.x you could use also the significant text aggregation - doc here, without create the keyword field, but i never try it, i don't know how it works. Instead if you need to retrieve the frequency of the words for each document, you should use the termvector - doc here

这篇关于获取文本字段中最常用的10个单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆