Elasticsearch必须比过滤器更快的子句 [英] Elasticsearch must clause faster than filter

查看:149
本文介绍了Elasticsearch必须比过滤器更快的子句的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们使用Elasticsearch 7.2,最近一直观察到一些奇怪的东西

We use elasticsearch 7.2 and we've been observing something weird lately

我们尝试执行以下两个查询

We tried executing the following two queries

{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "customer(keyword_field)": "big_customer"
          }
        }
      ]
    }
  }
}

{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "customer(keyword_field)": "big_customer"
          }
        }
      ]
    }
  }
}

这大约匹配一百万个文档。第一个比第二个快(快十倍!)。我希望1会因为得分而变慢

This matches around ~1million documents. The 1st one was faster than the 2nd (10 times faster!). I expected 1 to be slower because of scoring

此外,当我添加排序时,它们都变慢了(第2个保持不变,第1个变慢了第2个)

Also, when i added sorting, both of them got slower (2nd remained the same, 1st became as slow as 2nd)

推荐答案

在使用未分析的 term 查询时,它不会再经历分析过程(char过滤器,tokenizer,token过滤器),因为它们与关键字字段匹配而又没有被分析。

As you are using the term query which are not analyzed ie it wont go through the analysis process(char filter, tokenizer, token filter) and again as they are matching on keyword field which is again not analyzed.

您可以想到这一点以 string equals 代码进行搜索,该代码仅检查哈希码,而且速度非常快,并且也可以缓存。

You can think of this search as string equals code which just checks for the hash-code and is really fast and also cached.

public boolean equals(Object anObject) {
        if (this == anObject) {
            return true;
        }
        if (anObject instanceof String) {
            String aString = (String)anObject;
            if (coder() == aString.coder()) {
                return isLatin1() ? StringLatin1.equals(value, aString.value)
                                  : StringUTF16.equals(value, aString.value);
            }
        }
        return false;
    }

现在根据文档

过滤器子句在过滤器上下文中执行,这意味着计分被忽略,并且子句被视为用于缓存。

它不应该速度不是很慢,但是缓慢的原因可能是因为其庞大的文档〜100万,Elasticsearch堆无法将所有文档都保留在缓存中,这又可能导致大量交换(在内存和磁盘中),这可能会导致延迟,而在术语查询的情况下,会保存高速缓存和未命中的其他查询。

这篇关于Elasticsearch必须比过滤器更快的子句的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆