弹性搜索:在场&使用正则表达式过滤特定值不匹配的值 [英] Elasticsearch: Run aggregation on field & filter out specific values using a regexp not matching values
问题描述
{
size:0,
aggs:{
paths:{
terms:{
field:path// Count没有唯一的路径〜>值
}
}
},
过滤器:{
bool:{
must_not:[
{
regexp:{
//路径不得包含媒体|缓存
path:{
value:(\ / media\b | \bcache\b)
}
}
}
]
}
}
}
运行时,它不会过滤掉包含缓存或媒体的路径的文档?
如果我删除过滤器,如果我离开,则返回相同的结果。
您可以尝试排除这些值在这样的术语聚合之内
{
size:0,
aggs:{
path:{
terms:{
field:path,
exclude:。*(media | cache) b $ b}
}
}
}
注意:从文档
注意:正则表达式查询的性能很大程度上取决于所选择的常规
表达式。匹配一切像。*非常慢,以及
使用正则表达式。如果可能,您应该尝试
在正则表达式开始之前使用长前缀
另一种方法是摆脱这些文档在查询阶段,因此您可以将过滤器移动到查询中,然后在剩余结果上进行汇总。
编辑:使用日期过滤器
您可以添加日期过滤器进行查询,以便您只能获得过去一天的结果,这样会有效。
{
query:{
range:{
name_of_date_field:{
gte:now-1d
}
}
},
size:0,
aggs:{
path:{
条款:{
field:path,
exclude:。*(media | cache)*
}
}
}
}
I'm trying to run an aggregation on a field & ignore specific values! So I've got a field path that holds a heap of different url paths.
{
"size": 0,
"aggs": {
"paths": {
"terms":{
"field": "path" // Count the no unique path ~> values
}
}
},
"filter": {
"bool": {
"must_not": [
{
"regexp": {
// path MUST NOT CONTAIN media | cache
"path": {
"value": "(\/media\b|\bcache\b)"
}
}
}
]
}
}
}
When running this, it doesn't filter out the documents which have a path that contains cache or media?!
If I remove the filter, the same results would be returned if I left it in.
You could try excluding those values inside the terms aggregation like this
{
"size": 0,
"aggs": {
"path": {
"terms": {
"field": "path",
"exclude": ".*(media|cache).*"
}
}
}
}
Caution: From the documentation
Note: The performance of a regexp query heavily depends on the regular expression chosen. Matching everything like .* is very slow as well as using lookaround regular expressions. If possible, you should try to use a long prefix before your regular expression starts
Another approach would be to get rid of those documents in query stage so you could move your filter to query and then aggregate on remaining results.
EDIT : With date filter
You could add date filter to query so that you would get only past day's results, something like this would work.
{
"query": {
"range": {
"name_of_date_field": {
"gte": "now-1d"
}
}
},
"size": 0,
"aggs": {
"path": {
"terms": {
"field": "path",
"exclude": ".*(media|cache).*"
}
}
}
}
这篇关于弹性搜索:在场&使用正则表达式过滤特定值不匹配的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!