弹性搜索禁用术语频率评分 [英] elasticsearch disable term frequency scoring
问题描述
我想改变弹性搜索中的评分系统,以摆脱计算多个外观的术语。例如,我想要:
texas texas texas
和
texas
出来作为相同的成绩。我发现弹性搜索表示的这个映射将禁用术语频率计数,但是我的搜索不会以相同的分数出来:
映射:{
business:{
properties:{
name:{
type:string,
index_options :docs,
norms:{enabled:false}}
}
}
}
}
任何帮助将不胜感激,我未能找到很多关于此的信息
编辑:
我添加了我的搜索代码,当我使用explain时会返回什么。 p>
我的搜索代码:
设置设置= ImmutableSettings.settingsBuilder()。 put(cluster.name,escluster)。build();
客户端客户端=新的TransportClient(设置)
.addTransportAddress(新的InetSocketTransportAddress(127.0.0.1,9300));
SearchRequest request = Requests.searchRequest(企业)
.source(SearchSourceBuilder.searchSource()。query(QueryBuilders.boolQuery()
.should(QueryBuilders.matchQuery name,Texas)
.minimumShouldMatch(1))))。searchType(SearchType.DFS_QUERY_THEN_FETCH);
ExplainRequest request2 = client.prepareIndex(企业,业务)
当我用解释我搜索时,我得到:
taken:14,
timed_out:false ,
_shards:{
total:3,
success:3,
failed:0
},
命中:{
total:2,
max_score:1.0,
hits:[{
_shard:1,
_node :BTqBPVDET5Kr83r-CYPqfA,
_index:企业,
_type:business,
_id:AU9U5KBks4zEorv9YI4n,
_score :$,
_source:{
name:texas
}
,
_explanation:{
:$,
description:weight(_all:texas in 0)[PerFieldSimilarity],结果为:,
details:[{
value:1.0,
description:fieldWeight in 0,product of:,
details:[{
value:1.0,
description:tf(freq = 1.0),频率为:,
details:[{
value:1.0,
description:termFreq = 1.0
}]
},{
value:1.0,
description:idf(docFreq = 2,maxDocs = 3)
},{
value:1.0,
description:fieldNorm(doc = 0)
}]
}]
}
},{
_shard:1,
_node:BTqBPVDET5Kr83r-CYPqfA,
_index:企业,
_type:business,
_id:AU9U5K6Ks4zEorv9YI4o,
_score:0.8660254,
_source:{
name:texas texas texas
}
,
_explanation:{
value:0.8660254,
description:weight(_all:texas in 0)[ PerFieldSimilarity],结果为:,
details:[{
value :0.8660254,
description:fieldWeight in 0,product of:,
details:[{
value:1.7320508,
description tf(freq = 3.0),频率为:,
details:[{
value:3.0,
description:termFreq = 3.0
}]
},{
value:1.0,
description:idf(docFreq = 2,maxDocs = 3)
},{
value:0.5,
description:fieldNorm(doc = 0)
}]
}]
}
}]
}
看起来它仍然考虑频率和文档频率。有任何想法吗?对不起,格式不正确我不知道为什么它出现这么怪异。
编辑编辑:
我的代码从浏览器搜索 http:// localhost:9200 / enterprises / business / _search?pretty = true& qname = texas
是:
{
花费了:2,
timed_out:false,
_shards:{
total:3,
success:3,
:0
},
hits:{
total:4,
max_score:1.0,
hits:[{
_index:企业,
_type:business,
_id:AU9YcCKjKvtg8NgyozGK,
_score:1.0,
_source :{business:{
name:texas texas texas texas}
}
},{
_index:企业,
_type:business,
_id:AU9YateBKvtg8Ngyoy-p,
_score:1.0,
_source:{
name:texas}
},{
_index:企业,
_type :business,
_id:AU9YavVnKvtg8Ngyoy-4,
_score:1.0,
_source:{
name:texas texas texas}
},{
_index:企业,
_type:business,
_id:AU9Yb7NgKvtg8NgyozFf
_score:1.0,
_source:{business:{
name:texas texas texas}
}
}]
}
}
它找到了我拥有的所有4个对象,并拥有所有得分相同。
当我运行我的java API搜索与解释我得到:
{
taken:2 ,
timed_out:false,
_shards:{
total:3,
success:3,
failed b $ b},
hits:{
total:2,
max_score:1.287682,
hits:[{
_shard :1,
_node:BTqBPVDET5Kr83r-CYPqfA,
_index:企业,
_type:business,
_id
$ b_exoreation:
_score:1.287682
_source:{
name:texas}
,
_explanation {
value:1.287682,
description:weight(name:texas in 0)[PerFieldSimilarity],结果为:,
details:[{
value:1.287682,
description:fieldWeight in 0,product of:,
details:[{
value:1.0,
描述:tf(freq = 1.0),fr eq:,
details:[{
value:1.0,
description:termFreq = 1.0
}]
} {
value:1.287682,
description:idf(docFreq = 2,maxDocs = 4)
},{
value:1.0,
description:fieldNorm(doc = 0)
}]
}]
}
},{
_shard:1,
_node:BTqBPVDET5Kr83r-CYPqfA,
_index:企业,
_type:business,
_id:AU9YavVnKvtg8Ngyoy-4
_score:1.1151654,
_source:{
name:texas texas texas}
,
_explanation:{
value:1.1151654,
description:weight(name:texas in 0)[PerFieldSimilarity],result of:,
details:[{
value :1.1151654,
description:fieldWeight in 0,product of:,
详细信息:[{
value:1.7320508,
description:tf(freq = 3.0),有频率为:,
details:[{
value:3.0,
description:termFreq = 3.0
}]
},{
value:1.287682,
描述:idf(docFreq = 2,maxDocs = 4)
},{
value:0.5,
description:fieldNorm(doc = 0)
}]
}]
}
}]
}
}
在字段初始化后,一个字段看起来不能覆盖索引选项
在地图中设置
示例:
put test
put test / business / _mapping
{
properties:{
name:{
type:string,
index_options:freqs,
规范:{
enabled:false
}
}
}
}
put test / business / _mapping
{
properties:{
name:{
type:string,
index_options:docs b $ b规范:{
enabled:false
}
}
}
}
获取测试/ _mapping
{
test:{
mappings:{
business:{
properties:{
name:{
type:string,
norms:{
enabled:false
},
index_options频率
}
}
}
}
}
}
您必须重新创建索引才能获取新的映射。
I want to change the scoring system in elasticsearch to get rid of counting multiple appearances of a term. For example, I want:
"texas texas texas"
and
"texas"
to come out as the same score. I had found this mapping that elasticsearch said would disable term frequency counting but my searches do not come out as the same score:
"mappings":{
"business": {
"properties" : {
"name" : {
"type" : "string",
"index_options" : "docs",
"norms" : { "enabled": false}}
}
}
}
}
Any help will be appreciated, I have not been able to find a lot of information on this.
Edit:
I am adding my search code and what gets returned when I use explain.
My search code:
Settings settings = ImmutableSettings.settingsBuilder().put("cluster.name", "escluster").build();
Client client = new TransportClient(settings)
.addTransportAddress(new InetSocketTransportAddress("127.0.0.1", 9300));
SearchRequest request = Requests.searchRequest("businesses")
.source(SearchSourceBuilder.searchSource().query(QueryBuilders.boolQuery()
.should(QueryBuilders.matchQuery("name", "Texas")
.minimumShouldMatch("1")))).searchType(SearchType.DFS_QUERY_THEN_FETCH);
ExplainRequest request2 = client.prepareIndex("businesses", "business")
and when I search with explain I get:
"took" : 14,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [ {
"_shard" : 1,
"_node" : "BTqBPVDET5Kr83r-CYPqfA",
"_index" : "businesses",
"_type" : "business",
"_id" : "AU9U5KBks4zEorv9YI4n",
"_score" : 1.0,
"_source":{
"name" : "texas"
}
,
"_explanation" : {
"value" : 1.0,
"description" : "weight(_all:texas in 0) [PerFieldSimilarity], result of:",
"details" : [ {
"value" : 1.0,
"description" : "fieldWeight in 0, product of:",
"details" : [ {
"value" : 1.0,
"description" : "tf(freq=1.0), with freq of:",
"details" : [ {
"value" : 1.0,
"description" : "termFreq=1.0"
} ]
}, {
"value" : 1.0,
"description" : "idf(docFreq=2, maxDocs=3)"
}, {
"value" : 1.0,
"description" : "fieldNorm(doc=0)"
} ]
} ]
}
}, {
"_shard" : 1,
"_node" : "BTqBPVDET5Kr83r-CYPqfA",
"_index" : "businesses",
"_type" : "business",
"_id" : "AU9U5K6Ks4zEorv9YI4o",
"_score" : 0.8660254,
"_source":{
"name" : "texas texas texas"
}
,
"_explanation" : {
"value" : 0.8660254,
"description" : "weight(_all:texas in 0) [PerFieldSimilarity], result of:",
"details" : [ {
"value" : 0.8660254,
"description" : "fieldWeight in 0, product of:",
"details" : [ {
"value" : 1.7320508,
"description" : "tf(freq=3.0), with freq of:",
"details" : [ {
"value" : 3.0,
"description" : "termFreq=3.0"
} ]
}, {
"value" : 1.0,
"description" : "idf(docFreq=2, maxDocs=3)"
}, {
"value" : 0.5,
"description" : "fieldNorm(doc=0)"
} ]
} ]
}
} ]
}
It looks like it is still considering frequency and doc frequency. Any ideas? Sorry for the bad formatting I don't know why it is appearing so grotesque.
Edit Edit:
My code from the browser search http://localhost:9200/businesses/business/_search?pretty=true&qname=texas is:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 4,
"max_score" : 1.0,
"hits" : [ {
"_index" : "businesses",
"_type" : "business",
"_id" : "AU9YcCKjKvtg8NgyozGK",
"_score" : 1.0,
"_source":{"business" : {
"name" : "texas texas texas texas" }
}
}, {
"_index" : "businesses",
"_type" : "business",
"_id" : "AU9YateBKvtg8Ngyoy-p",
"_score" : 1.0,
"_source":{
"name" : "texas" }
}, {
"_index" : "businesses",
"_type" : "business",
"_id" : "AU9YavVnKvtg8Ngyoy-4",
"_score" : 1.0,
"_source":{
"name" : "texas texas texas" }
}, {
"_index" : "businesses",
"_type" : "business",
"_id" : "AU9Yb7NgKvtg8NgyozFf",
"_score" : 1.0,
"_source":{"business" : {
"name" : "texas texas texas" }
}
} ]
}
}
It finds all 4 objects I have in there and has them all the same score. When I run my java API search with explain I get:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.287682,
"hits" : [ {
"_shard" : 1,
"_node" : "BTqBPVDET5Kr83r-CYPqfA",
"_index" : "businesses",
"_type" : "business",
"_id" : "AU9YateBKvtg8Ngyoy-p",
"_score" : 1.287682,
"_source":{
"name" : "texas" }
,
"_explanation" : {
"value" : 1.287682,
"description" : "weight(name:texas in 0) [PerFieldSimilarity], result of:",
"details" : [ {
"value" : 1.287682,
"description" : "fieldWeight in 0, product of:",
"details" : [ {
"value" : 1.0,
"description" : "tf(freq=1.0), with freq of:",
"details" : [ {
"value" : 1.0,
"description" : "termFreq=1.0"
} ]
}, {
"value" : 1.287682,
"description" : "idf(docFreq=2, maxDocs=4)"
}, {
"value" : 1.0,
"description" : "fieldNorm(doc=0)"
} ]
} ]
}
}, {
"_shard" : 1,
"_node" : "BTqBPVDET5Kr83r-CYPqfA",
"_index" : "businesses",
"_type" : "business",
"_id" : "AU9YavVnKvtg8Ngyoy-4",
"_score" : 1.1151654,
"_source":{
"name" : "texas texas texas" }
,
"_explanation" : {
"value" : 1.1151654,
"description" : "weight(name:texas in 0) [PerFieldSimilarity], result of:",
"details" : [ {
"value" : 1.1151654,
"description" : "fieldWeight in 0, product of:",
"details" : [ {
"value" : 1.7320508,
"description" : "tf(freq=3.0), with freq of:",
"details" : [ {
"value" : 3.0,
"description" : "termFreq=3.0"
} ]
}, {
"value" : 1.287682,
"description" : "idf(docFreq=2, maxDocs=4)"
}, {
"value" : 0.5,
"description" : "fieldNorm(doc=0)"
} ]
} ]
}
} ]
}
}
Looks like one cannot override the index options
for a field after the field has been initial set in mapping
Example:
put test
put test/business/_mapping
{
"properties": {
"name": {
"type": "string",
"index_options": "freqs",
"norms": {
"enabled": false
}
}
}
}
put test/business/_mapping
{
"properties": {
"name": {
"type": "string",
"index_options": "docs",
"norms": {
"enabled": false
}
}
}
}
get test/business/_mapping
{
"test": {
"mappings": {
"business": {
"properties": {
"name": {
"type": "string",
"norms": {
"enabled": false
},
"index_options": "freqs"
}
}
}
}
}
}
You would have to recreate the index to pick up the new mapping
这篇关于弹性搜索禁用术语频率评分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!