如何在弹性搜索文档中得到每个单词的总计数? [英] How can I get total count of each words in elasticsearch document?

查看:145
本文介绍了如何在弹性搜索文档中得到每个单词的总计数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我搜索了这个问题,但找不到任何有用的答案。我想得到一个文档中每个单词的总计数,例如我的索引中有一些tweet,有一个tweet说这样的话,我真的很无聊,我想去我家的甜蜜的家。该查询应该返回这样的回复:

I searched about the question but couldn't find any useful answer. I want to get the total count for each word in a document, for example I have some tweets in my indices and there is a tweet that says something like this "It is so boring here I want to go to my home sweet home". The query should return the response like this:

It:1
is:1
so:1
boring:1
here:1
I:1
want:1
to:2
go:1
my:1
home:2
sweet:1

有可能吗? / p>

Is it possible to do that?

推荐答案

您正在寻找 术语向量 ,利用分析器。就像这样做一样,您可以定义所需的任何分析器,即词干分析器将词转换为根/正常形式。
请查看文档以进一步

在:

POST so/_close
PUT so/_settings
{
  "settings": {
    "analysis":{ 
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "my_stemmer"]
        }
      },
      "filter": {
        "my_stemmer": {
          "type": "stemmer",
          "name": "english"
        }
      }
    }
  }
}
POST so/_open
PUT so/t1/_mapping
{
  "t1": {
    "properties": {
      "tweet": {
        "type": "string",
        "store": true,
        "index_analyzer": "my_analyzer"
      }
    }
  }
}
POST so/t1/1
{"tweet": "It is so boring here I want to go to my home sweet home. So I'm bored"}

Out:

{
   "_index": "so",
   "_type": "t1",
   "_id": "1",
   "_version": 2,
   "found": true,
   "term_vectors": {
      "tweet": {
         "field_statistics": {
            "sum_doc_freq": 13,
            "doc_count": 1,
            "sum_ttf": 17
         },
         "terms": {
            "bore": {
               "term_freq": 2,
               ...
            },
            "go": {
               "term_freq": 1,
               ...
            },
            "here": {
               "term_freq": 1,
               ...
            },
            "home": {
               "term_freq": 2,
               ...
            },
            "i": {
               "term_freq": 1,
               ...
            },
            "i'm": {
               "term_freq": 1,
               ...
            },
            "is": {
               "term_freq": 1,
               ...
            },
            "it": {
               "term_freq": 1,
               ...
            },
            "my": {
               "term_freq": 1,
               ...
            },
            "so": {
               "term_freq": 2,
               ...
            },
            "sweet": {
               "term_freq": 1,
               ...
            },
            "to": {
               "term_freq": 2,
               ...
            },
            "want": {
               "term_freq": 1,
               ...
            }
         }
      }
   }
}

这篇关于如何在弹性搜索文档中得到每个单词的总计数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆