如何获得elasticsearch文档中每个单词的总数? [英] How can I get total count of each words in elasticsearch document?

查看:38
本文介绍了如何获得elasticsearch文档中每个单词的总数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我搜索了这个问题,但找不到任何有用的答案.我想获得文档中每个单词的总数,例如,我的索引中有一些推文,并且有一条推文说这里太无聊了,我想去我家甜蜜的家".查询应返回如下响应:

It:1是:1所以:1无聊:1这里:1我:1想要:1到:2去:1我的:1家:2甜:1

能做到吗?

解决方案

您正在寻找 术语向量,它利用了分析器.这样做时,您可以定义任何您需要的分析器,即词干分析器将单词转换为词根/范式.查看文档了解更多详情.>

在:

POST so/_closePUT so/_settings{设置":{分析":{分析器":{我的分析器":{"类型": "自定义","tokenizer": "标准",过滤器":[小写",my_stemmer"]}},筛选": {我的词干":{类型":词干",姓名":英文"}}}}}POST so/_openPUT so/t1/_mapping{t1":{特性": {推特":{类型":字符串",商店":真的,"index_analyzer": "my_analyzer"}}}}发布 so/t1/1{"tweet": "这里好无聊,我想去我家甜蜜的家.所以我很无聊"}

出:

<代码>{"_index": "所以","_type": "t1","_id": "1","_version": 2,发现":真的,term_vectors":{推特":{字段统计":{"sum_doc_freq": 13,doc_count":1,sum_ttf":17},条款":{钻孔":{term_freq":2,...},去": {term_freq":1,...},这里": {term_freq":1,...},家": {term_freq":2,...},一世": {term_freq":1,...},我是": {term_freq":1,...},是": {term_freq":1,...},它": {term_freq":1,...},我的": {term_freq":1,...},所以": {term_freq":2,...},甜的": {term_freq":1,...},到": {term_freq":2,...},想": {term_freq":1,...}}}}}

I searched about the question but couldn't find any useful answer. I want to get the total count for each word in a document, for example I have some tweets in my indices and there is a tweet that says something like this "It is so boring here I want to go to my home sweet home". The query should return the response like this:

It:1
is:1
so:1
boring:1
here:1
I:1
want:1
to:2
go:1
my:1
home:2
sweet:1

Is it possible to do that?

解决方案

You're looking for term vectors, which leverages analyzers. As as it do so, you can define any analyzer you need, i.e. stemming analyzer to transform words to root/normal form. Take a look at documentation for further details.

In:

POST so/_close
PUT so/_settings
{
  "settings": {
    "analysis":{ 
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "my_stemmer"]
        }
      },
      "filter": {
        "my_stemmer": {
          "type": "stemmer",
          "name": "english"
        }
      }
    }
  }
}
POST so/_open
PUT so/t1/_mapping
{
  "t1": {
    "properties": {
      "tweet": {
        "type": "string",
        "store": true,
        "index_analyzer": "my_analyzer"
      }
    }
  }
}
POST so/t1/1
{"tweet": "It is so boring here I want to go to my home sweet home. So I'm bored"}

Out:

{
   "_index": "so",
   "_type": "t1",
   "_id": "1",
   "_version": 2,
   "found": true,
   "term_vectors": {
      "tweet": {
         "field_statistics": {
            "sum_doc_freq": 13,
            "doc_count": 1,
            "sum_ttf": 17
         },
         "terms": {
            "bore": {
               "term_freq": 2,
               ...
            },
            "go": {
               "term_freq": 1,
               ...
            },
            "here": {
               "term_freq": 1,
               ...
            },
            "home": {
               "term_freq": 2,
               ...
            },
            "i": {
               "term_freq": 1,
               ...
            },
            "i'm": {
               "term_freq": 1,
               ...
            },
            "is": {
               "term_freq": 1,
               ...
            },
            "it": {
               "term_freq": 1,
               ...
            },
            "my": {
               "term_freq": 1,
               ...
            },
            "so": {
               "term_freq": 2,
               ...
            },
            "sweet": {
               "term_freq": 1,
               ...
            },
            "to": {
               "term_freq": 2,
               ...
            },
            "want": {
               "term_freq": 1,
               ...
            }
         }
      }
   }
}

这篇关于如何获得elasticsearch文档中每个单词的总数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆