获取文本字段中最常用的10个单词 [英] Get top 10 most used words in text fields

查看：86 发布时间：2020/10/28 2:14:52 elasticsearch kibana

本文介绍了获取文本字段中最常用的10个单词的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含数千个文档的索引，每个文档都有一个全文本字段。

I have an index containing thousands of documents, each one of them having a full text field.

我想搜索所有这些字段并获取10个最

I want to search through all those fields and fetch the 10 most common words that come back most often.

如果可能的话，我也想在Kibana上可视化它。

I would also like a way of visualizing it on Kibana if that's possible.

推荐答案

最常见的实现方法是使用关键字数据类型复制全文字段。这将使您能够在该字段上进行术语汇总-此处的文档。也许您可以考虑进行重要术语汇总-此处的文档，从而避免出现停用词和常见词。在ES 6.x中，您还可以使用重要文本聚合-此处的文档，而不创建关键字字段，但我从不试试吧，我不知道它是如何工作的。相反，如果需要检索每个文档的单词出现频率，则应使用 termvector -在此下载文件

The most common way to achieve that is to duplicate your full text field with a keyword datatype. That will get you able to make terms aggregation on that field - doc here. Maybe you could consider to do a significant term aggregation - doc here, thus to avoid the presence of stopwords and common words. In ES 6.x you could use also the significant text aggregation - doc here, without create the keyword field, but i never try it, i don't know how it works. Instead if you need to retrieve the frequency of the words for each document, you should use the termvector - doc here

这篇关于获取文本字段中最常用的10个单词的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

获取文本字段中最常用的10个单词 [英] Get top 10 most used words in text fields

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

获取文本字段中最常用的10个单词 [英] Get top 10 most used words in text fields

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭