弹性搜索得到的结果太多,需要帮助过滤查询 [英] elasticsearch getting too many results, need help filtering query

查看:137
本文介绍了弹性搜索得到的结果太多,需要帮助过滤查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我有很多问题了解ES查询系统的底层。例如,我有以下查询:

  {
size:0,
query:{
bool:{
必须:[
{
term:{
referer:www.xx.yy.com
}
},
{
range:{
@timestamp:{
gte:now,
lt:now-1h
}
}
}
]
}
},
aggs:{
间隔:{
date_histogram :{
field:@timestamp,
interval:0.5h
},
aggs:{
what :$ {


$ b b}

请求得到太多结果:


status:500,reason:
ElasticsearchException [org.elasticsearch.common.breaker.CircuitBreakingException:
数据太大,字段[@timestamp]的数据将大于[3200306380 / 2.9gb]]的限制
;嵌套:
UncheckedExecutionException [org.elasticsearch.common.breaker.CircuitBreakingException:
数据太大,字段[@timestamp]的数据将大于限制
[3200306380 / 2.9gb]] ;嵌套:CircuitBreakingException [Data too
large,field [@timestamp]的数据将大于
的限制[3200306380 / 2.9gb]];


我已经尝试了这个请求:

 code> {
size:0,
filter:{
and:[
{
term:{
referer:www.geoportail.gouv.fr
}
},
{
range:{
@timestamp:{
from:2014-10-04,
to:2014-10-05
}
}
}
]
},
aggs:{
interval:{
date_histogram:{
field:@timestamp,
间隔:0.5h
},
aggs:{
what:{
cardinality:{
field 主持人
}
}
}
}
}
}

我想过滤数据,以获得正确的结果,任何帮助将不胜感激!

解决方案

我发现了一个解决方案,这很奇怪。
我已经遵循dimzak建议并清除缓存:

  curl --noproxy localhost -XPOSThttp: / localhost:9200 / _cache / clear

然后我使用过滤而不是Olly建议的查询: / p>

  {
size:0,
query:{
:{
query:{
term:{
referer:www.xx.yy.fr
}
},
filter:{
range:{
@timestamp:{
from:2014-10-04T00:00,
to :2014-10-05T00:00
}
}
}
}
},
aggs:{
interval:{
date_histogram:{
field:@timestamp,
interval:0.5h
},
aggs:{
what:{
cardinality:{
field:host
}
}
}
}
}
}

我不能给你两个ansxwer,我认为dimzak值得最好,但赞成你两个人:)


I'm having much problem understanding the underlying of ES querying system.

I've got the following query for example:

{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "referer": "www.xx.yy.com"
          }
        },
        {
          "range": {
            "@timestamp": {
              "gte": "now",
              "lt": "now-1h"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "interval": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "0.5h"
      },
      "aggs": {
        "what": {
          "cardinality": {
            "field": "host"
          }
        }
      }
    }
  }
}

That request get too many results:

"status" : 500, "reason" : "ElasticsearchException[org.elasticsearch.common.breaker.CircuitBreakingException: Data too large, data for field [@timestamp] would be larger than limit of [3200306380/2.9gb]]; nested: UncheckedExecutionException[org.elasticsearch.common.breaker.CircuitBreakingException: Data too large, data for field [@timestamp] would be larger than limit of [3200306380/2.9gb]]; nested: CircuitBreakingException[Data too large, data for field [@timestamp] would be larger than limit of [3200306380/2.9gb]]; "

I've tryied that request:

{
  "size": 0,
  "filter": {
    "and": [
      {
        "term": {
          "referer": "www.geoportail.gouv.fr"
        }
      },
      {
        "range": {
          "@timestamp": {
            "from": "2014-10-04",
            "to": "2014-10-05"
          }
        }
      }
    ]
  },
  "aggs": {
    "interval": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "0.5h"
      },
      "aggs": {
        "what": {
          "cardinality": {
            "field": "host"
          }
        }
      }
    }
  }
}

I would like to filter the data in order to be able to get a correct result, any help would be much appreciated!

解决方案

I found a solution, it's kind of weird. I've followed dimzak adviced and clear the cache:

curl --noproxy localhost -XPOST "http://localhost:9200/_cache/clear"

Then I used filtering instead of querying as Olly suggested:

{
  "size": 0,
  "query": {
    "filtered": {
      "query":  {
        "term": {
          "referer": "www.xx.yy.fr"
        }
      },
      "filter" : { 
        "range": {
          "@timestamp": { 
            "from": "2014-10-04T00:00", 
            "to": "2014-10-05T00:00"
          }  
        }
      }
    }
  },
  "aggs": {
  "interval": {
    "date_histogram": {
    "field": "@timestamp",
    "interval": "0.5h"
    },
    "aggs": {
    "what": {
      "cardinality": {
      "field": "host"
      }
    }
    }
  }
  }
}

I cannot give you both the ansxwer, I think dimzak deserves it best, but thumbs up to you two guys :)

这篇关于弹性搜索得到的结果太多,需要帮助过滤查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆