如何获得elasticsearch文档中每个单词的总数? [英] How can I get total count of each words in elasticsearch document?
问题描述
我搜索了这个问题,但找不到任何有用的答案.我想获得文档中每个单词的总数,例如,我的索引中有一些推文,并且有一条推文说这里太无聊了,我想去我家甜蜜的家".查询应返回如下响应:
It:1是:1所以:1无聊:1这里:1我:1想要:1到:2去:1我的:1家:2甜:1
能做到吗?
您正在寻找 术语向量
,它利用了分析器.这样做时,您可以定义任何您需要的分析器,即词干分析器将单词转换为词根/范式.查看文档了解更多详情.>
在:
POST so/_closePUT so/_settings{设置":{分析":{分析器":{我的分析器":{"类型": "自定义","tokenizer": "标准",过滤器":[小写",my_stemmer"]}},筛选": {我的词干":{类型":词干",姓名":英文"}}}}}POST so/_openPUT so/t1/_mapping{t1":{特性": {推特":{类型":字符串",商店":真的,"index_analyzer": "my_analyzer"}}}}发布 so/t1/1{"tweet": "这里好无聊,我想去我家甜蜜的家.所以我很无聊"}
出:
<代码>{"_index": "所以","_type": "t1","_id": "1","_version": 2,发现":真的,term_vectors":{推特":{字段统计":{"sum_doc_freq": 13,doc_count":1,sum_ttf":17},条款":{钻孔":{term_freq":2,...},去": {term_freq":1,...},这里": {term_freq":1,...},家": {term_freq":2,...},一世": {term_freq":1,...},我是": {term_freq":1,...},是": {term_freq":1,...},它": {term_freq":1,...},我的": {term_freq":1,...},所以": {term_freq":2,...},甜的": {term_freq":1,...},到": {term_freq":2,...},想": {term_freq":1,...}}}}}
I searched about the question but couldn't find any useful answer. I want to get the total count for each word in a document, for example I have some tweets in my indices and there is a tweet that says something like this "It is so boring here I want to go to my home sweet home". The query should return the response like this:
It:1
is:1
so:1
boring:1
here:1
I:1
want:1
to:2
go:1
my:1
home:2
sweet:1
Is it possible to do that?
You're looking for term vectors
, which leverages analyzers. As as it do so, you can define any analyzer you need, i.e. stemming analyzer to transform words to root/normal form.
Take a look at documentation for further details.
In:
POST so/_close
PUT so/_settings
{
"settings": {
"analysis":{
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "my_stemmer"]
}
},
"filter": {
"my_stemmer": {
"type": "stemmer",
"name": "english"
}
}
}
}
}
POST so/_open
PUT so/t1/_mapping
{
"t1": {
"properties": {
"tweet": {
"type": "string",
"store": true,
"index_analyzer": "my_analyzer"
}
}
}
}
POST so/t1/1
{"tweet": "It is so boring here I want to go to my home sweet home. So I'm bored"}
Out:
{
"_index": "so",
"_type": "t1",
"_id": "1",
"_version": 2,
"found": true,
"term_vectors": {
"tweet": {
"field_statistics": {
"sum_doc_freq": 13,
"doc_count": 1,
"sum_ttf": 17
},
"terms": {
"bore": {
"term_freq": 2,
...
},
"go": {
"term_freq": 1,
...
},
"here": {
"term_freq": 1,
...
},
"home": {
"term_freq": 2,
...
},
"i": {
"term_freq": 1,
...
},
"i'm": {
"term_freq": 1,
...
},
"is": {
"term_freq": 1,
...
},
"it": {
"term_freq": 1,
...
},
"my": {
"term_freq": 1,
...
},
"so": {
"term_freq": 2,
...
},
"sweet": {
"term_freq": 1,
...
},
"to": {
"term_freq": 2,
...
},
"want": {
"term_freq": 1,
...
}
}
}
}
}
这篇关于如何获得elasticsearch文档中每个单词的总数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!