弹性搜索禁用术语频率评分 [英] elasticsearch disable term frequency scoring

查看：126 发布时间：2017/8/7 2:24:50 elasticsearch frequency java-api term scoring

本文介绍了弹性搜索禁用术语频率评分的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想改变弹性搜索中的评分系统，以摆脱计算多个外观的术语。例如，我想要：

texas texas texas

和

texas

出来作为相同的成绩。我发现弹性搜索表示的这个映射将禁用术语频率计数，但是我的搜索不会以相同的分数出来：

 映射：{
business：{
properties：{
name：{
type：string，
index_options ：docs，
norms：{enabled：false}} 
} 
} 
}

}

任何帮助将不胜感激，我未能找到很多关于此的信息

编辑：

我添加了我的搜索代码，当我使用explain时会返回什么。 p>

我的搜索代码：

 设置设置= ImmutableSettings.settingsBuilder（）。 put（cluster.name，escluster）。build（）; 
客户端客户端=新的TransportClient（设置）
 .addTransportAddress（新的InetSocketTransportAddress（127.0.0.1，9300））; 
 
 SearchRequest request = Requests.searchRequest（企业）
 .source（SearchSourceBuilder.searchSource（）。query（QueryBuilders.boolQuery（）
 .should（QueryBuilders.matchQuery name，Texas）
 .minimumShouldMatch（1））））。searchType（SearchType.DFS_QUERY_THEN_FETCH）; 
 
 ExplainRequest request2 = client.prepareIndex（企业，业务）

当我用解释我搜索时，我得到：

 taken：14，
timed_out：false ，
_shards：{
total：3，
success：3，
failed：0 
}，
命中：{
total：2，
max_score：1.0，
hits：[{
_shard：1，
_node ：BTqBPVDET5Kr83r-CYPqfA，
_index：企业，
_type：business，
_id：AU9U5KBks4zEorv9YI4n，
_score ：$，
_source：{
name：texas
} 
，
_explanation：{
 ：$，
description：weight（_all：texas in 0）[PerFieldSimilarity]，结果为：，
details：[{
value：1.0，
description：fieldWeight in 0，product of：，
details：[{
 value：1.0，
description：tf（freq = 1.0），频率为：，
details：[{
value：1.0，
description：termFreq = 1.0
}] 
}，{
value：1.0，
description：idf（docFreq = 2，maxDocs = 3）
}，{
value：1.0，
description：fieldNorm（doc = 0）
}] 
}] 
} 
}，{
_shard：1，
_node：BTqBPVDET5Kr83r-CYPqfA，
_index：企业，
_type：business，
_id：AU9U5K6Ks4zEorv9YI4o，
_score：0.8660254，
_source：{
name：texas texas texas
} 
，
_explanation：{
value：0.8660254，
description：weight（_all：texas in 0）[ PerFieldSimilarity]，结果为：，
details：[{
value ：0.8660254，
description：fieldWeight in 0，product of：，
details：[{
value：1.7320508，
description tf（freq = 3.0），频率为：，
details：[{
value：3.0，
description：termFreq = 3.0
 }] 
}，{
value：1.0，
description：idf（docFreq = 2，maxDocs = 3）
}，{
 value：0.5，
description：fieldNorm（doc = 0）
}] 
}] 
} 
}] 
}

看起来它仍然考虑频率和文档频率。有任何想法吗？对不起，格式不正确我不知道为什么它出现这么怪异。

编辑编辑：

我的代码从浏览器搜索 http：// localhost：9200 / enterprises / business / _search？pretty = true& qname = texas
是：

  {
花费了：2，
timed_out：false，
_shards：{
total：3，
success：3，
 ：0 
}，
hits：{
total：4，
max_score：1.0，
hits：[{
_index：企业，
_type：business，
_id：AU9YcCKjKvtg8NgyozGK，
_score：1.0，
_source ：{business：{
name：texas texas texas texas} 
} 
}，{
_index：企业，
_type：business，
_id：AU9YateBKvtg8Ngyoy-p，
_score：1.0， 
_source：{
name：texas} 
 
}，{
_index：企业，
_type ：business，
_id：AU9YavVnKvtg8Ngyoy-4，
_score：1.0，
_source：{
name：texas texas texas} 
 
}，{
_index：企业，
_type：business，
_id：AU9Yb7NgKvtg8NgyozFf 
_score：1.0，
_source：{business：{
name：texas texas texas} 
} 
}] 
} 
}

它找到了我拥有的所有4个对象，并拥有所有得分相同。
当我运行我的java API搜索与解释我得到：

  {
taken：2 ，
timed_out：false，
_shards：{
total：3，
success：3，
failed b $ b}，
hits：{
total：2，
max_score：1.287682，
hits：[{
_shard ：1，
_node：BTqBPVDET5Kr83r-CYPqfA，
_index：企业，
_type：business，
_id 
 $ b_exoreation：
_score：1.287682 
_source：{
name：texas} 
，
_explanation {
value：1.287682，
description：weight（name：texas in 0）[PerFieldSimilarity]，结果为：，
details：[{
value：1.287682，
description：fieldWeight in 0，product of：，
details：[{
value：1.0，
描述：tf（freq = 1.0），fr eq：，
details：[{
value：1.0，
description：termFreq = 1.0
}] 
} {
value：1.287682，
description：idf（docFreq = 2，maxDocs = 4）
}，{
value：1.0，
description：fieldNorm（doc = 0）
}] 
}] 
} 
}，{
_shard：1，
_node：BTqBPVDET5Kr83r-CYPqfA，
_index：企业，
_type：business，
_id：AU9YavVnKvtg8Ngyoy-4 
_score：1.1151654，
_source：{
name：texas texas texas} 
，
_explanation：{
value：1.1151654，
description：weight（name：texas in 0）[PerFieldSimilarity]，result of：，
details：[{
value ：1.1151654，
description：fieldWeight in 0，product of：，
详细信息：[{
value：1.7320508，
description：tf（freq = 3.0），有频率为：，
details：[{
value：3.0，
description：termFreq = 3.0
}] 
}，{
value：1.287682，
描述：idf（docFreq = 2，maxDocs = 4）
}，{
value：0.5，
description：fieldNorm（doc = 0）
}] 
}] 
} 
}] 
} 
}

解决方案

在字段初始化后，一个字段看起来不能覆盖索引选项在地图中设置

示例：

  put test 
 put test / business / _mapping 
 {
 
properties：{
name：{
type：string，
 index_options：freqs，
 规范：{
enabled：false 
} 
} 
} 
 
} 
 put test / business / _mapping 
 {
 
properties：{
name：{
type：string，
index_options：docs b $ b规范：{
enabled：false 
} 
} 
} 
 
} 
获取测试/ _mapping 
 
 {
test：{
mappings：{
business：{
properties：{
 name：{
type：string，
norms：{
enabled：false 
}，
index_options频率
} 
} 
} 
} 
} 
}

您必须重新创建索引才能获取新的映射。

I want to change the scoring system in elasticsearch to get rid of counting multiple appearances of a term. For example, I want:

"texas texas texas"

and

"texas"

to come out as the same score. I had found this mapping that elasticsearch said would disable term frequency counting but my searches do not come out as the same score:

"mappings":{
"business": {   
   "properties" : {
       "name" : {
          "type" : "string",
          "index_options" : "docs",
          "norms" : { "enabled": false}}
        }
    }
}

}

Any help will be appreciated, I have not been able to find a lot of information on this.

Edit:

I am adding my search code and what gets returned when I use explain.

My search code:

Settings settings = ImmutableSettings.settingsBuilder().put("cluster.name", "escluster").build();
    Client client = new TransportClient(settings)
    .addTransportAddress(new InetSocketTransportAddress("127.0.0.1", 9300));

    SearchRequest request =  Requests.searchRequest("businesses")
            .source(SearchSourceBuilder.searchSource().query(QueryBuilders.boolQuery()
            .should(QueryBuilders.matchQuery("name", "Texas")
            .minimumShouldMatch("1")))).searchType(SearchType.DFS_QUERY_THEN_FETCH);

    ExplainRequest request2 = client.prepareIndex("businesses", "business")

and when I search with explain I get:

  "took" : 14,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [ {
      "_shard" : 1,
      "_node" : "BTqBPVDET5Kr83r-CYPqfA",
      "_index" : "businesses",
      "_type" : "business",
      "_id" : "AU9U5KBks4zEorv9YI4n",
      "_score" : 1.0,
      "_source":{
"name" : "texas"
}
,
      "_explanation" : {
        "value" : 1.0,
        "description" : "weight(_all:texas in 0) [PerFieldSimilarity], result of:",
        "details" : [ {
          "value" : 1.0,
          "description" : "fieldWeight in 0, product of:",
          "details" : [ {
            "value" : 1.0,
            "description" : "tf(freq=1.0), with freq of:",
            "details" : [ {
              "value" : 1.0,
              "description" : "termFreq=1.0"
            } ]
          }, {
            "value" : 1.0,
            "description" : "idf(docFreq=2, maxDocs=3)"
          }, {
            "value" : 1.0,
            "description" : "fieldNorm(doc=0)"
          } ]
        } ]
      }
    }, {
      "_shard" : 1,
      "_node" : "BTqBPVDET5Kr83r-CYPqfA",
      "_index" : "businesses",
      "_type" : "business",
      "_id" : "AU9U5K6Ks4zEorv9YI4o",
      "_score" : 0.8660254,
      "_source":{
"name" : "texas texas texas"
}
,
      "_explanation" : {
        "value" : 0.8660254,
        "description" : "weight(_all:texas in 0) [PerFieldSimilarity], result of:",
        "details" : [ {
          "value" : 0.8660254,
          "description" : "fieldWeight in 0, product of:",
          "details" : [ {
            "value" : 1.7320508,
            "description" : "tf(freq=3.0), with freq of:",
            "details" : [ {
              "value" : 3.0,
              "description" : "termFreq=3.0"
            } ]
          }, {
            "value" : 1.0,
            "description" : "idf(docFreq=2, maxDocs=3)"
          }, {
            "value" : 0.5,
            "description" : "fieldNorm(doc=0)"
          } ]
        } ]
      }
    } ]
  }

It looks like it is still considering frequency and doc frequency. Any ideas? Sorry for the bad formatting I don't know why it is appearing so grotesque.

Edit Edit:

My code from the browser search http://localhost:9200/businesses/business/_search?pretty=true&qname=texas is:

    {
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  },
  "hits" : {
    "total" : 4,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "businesses",
      "_type" : "business",
      "_id" : "AU9YcCKjKvtg8NgyozGK",
      "_score" : 1.0,
      "_source":{"business" : {
"name" : "texas texas texas texas" }
}
    }, {
      "_index" : "businesses",
      "_type" : "business",
      "_id" : "AU9YateBKvtg8Ngyoy-p",
      "_score" : 1.0,
      "_source":{
"name" : "texas" }

    }, {
      "_index" : "businesses",
      "_type" : "business",
      "_id" : "AU9YavVnKvtg8Ngyoy-4",
      "_score" : 1.0,
      "_source":{
"name" : "texas texas texas" }

    }, {
      "_index" : "businesses",
      "_type" : "business",
      "_id" : "AU9Yb7NgKvtg8NgyozFf",
      "_score" : 1.0,
      "_source":{"business" : {
"name" : "texas texas texas" }
}
    } ]
  }
}

It finds all 4 objects I have in there and has them all the same score. When I run my java API search with explain I get:

    {
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.287682,
    "hits" : [ {
      "_shard" : 1,
      "_node" : "BTqBPVDET5Kr83r-CYPqfA",
      "_index" : "businesses",
      "_type" : "business",
      "_id" : "AU9YateBKvtg8Ngyoy-p",
      "_score" : 1.287682,
      "_source":{
"name" : "texas" }
,
      "_explanation" : {
        "value" : 1.287682,
        "description" : "weight(name:texas in 0) [PerFieldSimilarity], result of:",
        "details" : [ {
          "value" : 1.287682,
          "description" : "fieldWeight in 0, product of:",
          "details" : [ {
            "value" : 1.0,
            "description" : "tf(freq=1.0), with freq of:",
            "details" : [ {
              "value" : 1.0,
              "description" : "termFreq=1.0"
            } ]
          }, {
            "value" : 1.287682,
            "description" : "idf(docFreq=2, maxDocs=4)"
          }, {
            "value" : 1.0,
            "description" : "fieldNorm(doc=0)"
          } ]
        } ]
      }
    }, {
      "_shard" : 1,
      "_node" : "BTqBPVDET5Kr83r-CYPqfA",
      "_index" : "businesses",
      "_type" : "business",
      "_id" : "AU9YavVnKvtg8Ngyoy-4",
      "_score" : 1.1151654,
      "_source":{
"name" : "texas texas texas" }
,
      "_explanation" : {
        "value" : 1.1151654,
        "description" : "weight(name:texas in 0) [PerFieldSimilarity], result of:",
        "details" : [ {
          "value" : 1.1151654,
          "description" : "fieldWeight in 0, product of:",
          "details" : [ {
            "value" : 1.7320508,
            "description" : "tf(freq=3.0), with freq of:",
            "details" : [ {
              "value" : 3.0,
              "description" : "termFreq=3.0"
            } ]
          }, {
            "value" : 1.287682,
            "description" : "idf(docFreq=2, maxDocs=4)"
          }, {
            "value" : 0.5,
            "description" : "fieldNorm(doc=0)"
          } ]
        } ]
      }
    } ]
  }
}

解决方案

Looks like one cannot override the index options for a field after the field has been initial set in mapping

Example:

put test
put test/business/_mapping
{

      "properties": {
         "name": {
            "type": "string",
           "index_options": "freqs",
            "norms": {
               "enabled": false
            }
         }
      }

}
put test/business/_mapping
{

      "properties": {
         "name": {
            "type": "string",
            "index_options": "docs",
            "norms": {
               "enabled": false
            }
         }
      }

}
get  test/business/_mapping

   {
   "test": {
      "mappings": {
         "business": {
            "properties": {
               "name": {
                  "type": "string",
                  "norms": {
                     "enabled": false
                  },
                  "index_options": "freqs"
               }
            }
         }
      }
   }
}

You would have to recreate the index to pick up the new mapping

这篇关于弹性搜索禁用术语频率评分的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

弹性搜索禁用术语频率评分 [英] elasticsearch disable term frequency scoring

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

弹性搜索禁用术语频率评分 [英] elasticsearch disable term frequency scoring

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭