Elasticsearch - 通用构面结构 - 结合过滤器计算聚合 [英] Elasticsearch - generic facets structure - calculating aggregations combined with filters

查看:26
本文介绍了Elasticsearch - 通用构面结构 - 结合过滤器计算聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我们的一个新项目中,我们受到了这篇文章的启发 http://project-a.github.io/on-site-search-design-patterns-for-e-commerce/#generic-faceted-search用于制作我们的facet"结构.虽然我已经让它在文章描述的范围内工作,但我在选择方面时遇到了让它工作的问题.我希望有人可以提供一些尝试的提示,这样我就不必再次将所有聚合重做为单独的聚合计算.

In a new project of ours, we were inspired by this article http://project-a.github.io/on-site-search-design-patterns-for-e-commerce/#generic-faceted-search for doing our "facet" structure. And while I have got it working to the extent the article describes, I have run into issues in getting it to work when selecting facets. I hope someone can give a hint as to something to try, so I don’t have to redo all our aggregations into separate aggregation calculations again.

问题基本上是我们使用单个聚合一次计算所有方面",但是当我添加过滤器(fx.检查品牌名称)时,它会在返回时删除"所有其他品牌聚合体.我基本上想要的是它应该在计算其他方面时使用该品牌作为过滤器,而不是在计算品牌聚合时.这是必要的,例如,用户可以选择多个品牌.

The problem is basically that we are using a single aggregation to calculate all the "facets" at once, but when I add a filter (fx. checking a brand name), then it "removes" all the other brands when returning the aggregates. What I basically want is that it should use that brand as filter when calculating the other facets, but not when calculating the brand aggregations. This is necessary so the user can, for example, choose multiple brands.

查看 https://www.contorion.de/search/Metabo_Fein/ou1-ou2?q=Winkelschleifer&c=bovy(上面文章中描述的网站),我选择了Metabo"和Fein"制造商(Hersteller),并且展开 Hersteller 菜单,它会显示所有制造商,而不仅仅是选定的制造商.所以我知道这是可能的,我希望有人能提供关于如何编写聚合/过滤器的提示,这样我就能得到正确的电子商务方面的行为".

Looking at https://www.contorion.de/search/Metabo_Fein/ou1-ou2?q=Winkelschleifer&c=bovy (which is the site described in the above article), I have selected the "Metabo" and "Fein" manufacturer (Hersteller), and unfolding the Hersteller menu it shows all manufacturers and not just the ones selected. So I know it’s possible somehow and I hope some one out there has a hint as to how to write the aggregations / filters, so I get the "correct e-commerce facet behavior".

在 ES 中的产品上,我有以下结构:(与原始文章中相同,但命名为C#")

On the products in ES I have the following structure: (the same as in the original article, though "C#’ified" in naming)

"attributeStrings": [
    {
        "facetName": "Property",
        "facetValue": "Organic"
    },
    {
        "facetName": "Property",
        "facetValue": "Without parfume"
    },
    {
        "facetName": "Brand",
        "facetValue": "Adidas"
    }
]

因此,上述产品有 2 个属性/方面组 - 具有 2 个值的属性(有机、不含香水)和具有 1 个值的品牌(阿迪达斯).在没有任何过滤器的情况下,我通过以下查询计算聚合:

So the above product has 2 attributes/facet groups – Property with 2 values (Organic, Without parfume) and Brand with 1 value (Adidas). Without any filters I calculate the aggregations from the following query:

  "aggs": {
    "agg_attr_strings_filter": {
      "filter": {},
      "aggs": {
        "agg_attr_strings": {
          "nested": {
            "path": "attributeStrings"
          },
          "aggs": {
            "attr_name": {
              "terms": {
                "field": "attributeStrings.facetName"
              },
              "aggs": {
                "attr_value": {
                  "terms": {
                    "field": "attributeStrings.facetValue",
                    "size": 1000,
                    "order": [
                      {
                        "_term": "asc"
                      }
                    ]
   } } } } } } } }

现在,如果我选择属性有机"和品牌阿迪达斯",我会构建相同的聚合,但使用过滤器来应用这两个约束(如果出现问题...):

Now if I select Property "Organic" and Brand "Adidas" I build the same aggregation, but with a filter to apply those two constraints (which is were it kind of goes wrong...):

  "aggs": {
    "agg_attr_strings_filter": {
      "filter": {
        "bool": {
          "filter": [
            {
              "nested": {
                "query": {
                  "bool": {
                    "filter": [
                      {
                        "term": {
                          "attributeStrings.facetName": {
                            "value": "Property"
                          }
                        }
                      },
                      {
                        "terms": {
                          "attributeStrings.facetValue": [
                            "Organic"
                          ]
                        }
                      }
                    ]
                  }
                },
                "path": "attributeStrings"
              }
            },
            {
              "nested": {
                "query": {
                  "bool": {
                    "filter": [
                      {
                        "term": {
                          "attributeStrings.facetName": {
                            "value": "Brand"
                          }
                        }
                      },
                      {
                        "terms": {
                          "attributeStrings.facetValue": [
                            "Adidas"
                          ]
                        }
                      }
                    ]
                  }
                },
                "path": "attributeStrings"
              }
            }
          ]
        }
      },
      "aggs": {
        "agg_attr_strings": {
          "nested": {
            "path": "attributeStrings"
          },
          "aggs": {
            "attr_name": {
              "terms": {
                "field": "attributeStrings.facetName",
              },
              "aggs": {
                "attr_value": {
                  "terms": {
                    "field": "attributeStrings.facetValue",
                    "size": 1000,
                    "order": [
                      {
                        "_term": "asc"
                      }
                    ]
   } } } } } } } }

我可以看到这个模型的唯一方法是计算每个选定方面的聚合并以某种方式合并结果.但这似乎非常复杂,并且有点违背了文章中描述的模型的意义,所以我希望有一个更干净的解决方案,有人可以提供一些尝试的提示.

The only way I can see forward with this model, is to calculate the aggregation for each selected facet and somehow merge the result. But it seems very complex and kind of defeats the point of having the model as described in the article, so I hope there's a more clean solution and someone can give a hint at something to try.

推荐答案

我可以看到这个模型的唯一方法是计算每个选定方面的聚合并以某种方式合并结果.

The only way I can see forward with this model, is to calculate the aggregation for each selected facet and somehow merge the result.

这是完全正确的.如果选择了一个方面(例如 brand),那么如果您还想获取其他品牌进行多选,则不能使用全局品牌过滤器.您可以做的是在选定的方面应用所有其他过滤器,并在非选定的方面应用所有 过滤器.结果,您将有 n+1 个单独的聚合用于 n 个选定的过滤器 - 第一个用于所有方面,其余用于选定的方面.

This is exactly right. If one facet (e.g. brand) is selected than you can not use global brand filter if you also want to fetch other brands for multi-selection. What you can do is apply all other filters on selected facets, and all filters on non-selected facets. As a results you will have n+1 separate aggregations for n selected filters - first one is for all facets and the rest are for selected facets.

在您的情况下,查询可能如下所示:

In your case query might look like:

{
  "aggs": {
    "agg_attr_strings_filter": {
      "filter": {
        "bool": {
          "filter": [
            {
              "nested": {
                "query": {
                  "bool": {
                    "filter": [
                      {
                        "term": {
                          "attributeStrings.facetName": {
                            "value": "Property"
                          }
                        }
                      },
                      {
                        "terms": {
                          "attributeStrings.facetValue": [
                            "Organic"
                          ]
                        }
                      }
                    ]
                  }
                },
                "path": "attributeStrings"
              }
            },
            {
              "nested": {
                "query": {
                  "bool": {
                    "filter": [
                      {
                        "term": {
                          "attributeStrings.facetName": {
                            "value": "Brand"
                          }
                        }
                      },
                      {
                        "terms": {
                          "attributeStrings.facetValue": [
                            "Adidas"
                          ]
                        }
                      }
                    ]
                  }
                },
                "path": "attributeStrings"
              }
            }
          ]
        }
      },
      "aggs": {
        "agg_attr_strings": {
          "nested": {
            "path": "attributeStrings"
          },
          "aggs": {
            "attr_name": {
              "terms": {
                "field": "attributeStrings.facetName"
              },
              "aggs": {
                "attr_value": {
                  "terms": {
                    "field": "attributeStrings.facetValue",
                    "size": 1000,
                    "order": [
                      {
                        "_term": "asc"
                      }
                    ]
                  }
                }
              }
            }
          }
        }
      }
    },
    "special_agg_property": {
      "filter": {
        "nested": {
          "query": {
            "bool": {
              "filter": [
                {
                  "term": {
                    "attributeStrings.facetName": {
                      "value": "Brand"
                    }
                  }
                },
                {
                  "terms": {
                    "attributeStrings.facetValue": [
                      "Adidas"
                    ]
                  }
                }
              ]
            }
          },
          "path": "attributeStrings"
        }
      },
      "aggs": {
        "special_agg_property": {
          "nested": {
            "path": "attributeStrings"
          },
          "aggs": {
            "agg_filtered_special": {
              "filter": {
                "query": {
                  "match": {
                    "attributeStrings.facetName": "Property"
                  }
                }
              },
              "aggs": {
                "facet_value": {
                  "terms": {
                    "size": 1000,
                    "field": "attributeStrings.facetValue"
                  }
                }
              }
            }
          }
        }
      }
    },
    "special_agg_brand": {
      "filter": {
        "nested": {
          "query": {
            "bool": {
              "filter": [
                {
                  "term": {
                    "attributeStrings.facetName": {
                      "value": "Property"
                    }
                  }
                },
                {
                  "terms": {
                    "attributeStrings.facetValue": [
                      "Organic"
                    ]
                  }
                }
              ]
            }
          },
          "path": "attributeStrings"
        }
      },
      "aggs": {
        "special_agg_brand": {
          "nested": {
            "path": "attributeStrings"
          },
          "aggs": {
            "agg_filtered_special": {
              "filter": {
                "query": {
                  "match": {
                    "attributeStrings.facetName": "Brand"
                  }
                }
              },
              "aggs": {
                "facet_value": {
                  "terms": {
                    "size": 1000,
                    "field": "attributeStrings.facetValue"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

这个查询看起来超级大而且很吓人,但是生成这样的查询可以用几十行代码来完成.解析查询结果时,需要先解析通用聚合(使用所有过滤器的聚合),然后再解析特殊的facet聚合.在上面的例子中,首先解析来自 agg_attr_strings_filter 的结果,但这些结果还将包含 BrandProperty 的聚合值,这些值应该被聚合值覆盖来自 special_agg_propertyspecial_agg_brand此外,此查询很有效,因为 Elasticsearch 在缓存单独的过滤器子句方面做得很好,因此在查询的不同部分应用相同的过滤器应该很便宜.

This query looks super big and scary but generating such query can be done with few dozen lines of code. When parsing query results, you need to first parse general aggregation (one that uses all filters) and after special facet aggregations. From the upper example, first parse results from agg_attr_strings_filter but those results will also contain aggregation values for Brand and Property that should be overwritten by aggregation values from special_agg_property and special_agg_brand Also, this query is efficient since Elasticsearch does good job in caching separate filter clauses so applying same filters in different parts of query should be cheap.

但这似乎非常复杂,有点违背了文章中描述的模型的意义,所以我希望有一个更干净的解决方案,有人可以给出一些尝试的提示.

But it seems very complex and kind of defeats the point of having the model as described in the article, so I hope there's a more clean solution and someone can give a hint at something to try.

您需要将不同的过滤器应用于不同的方面,同时具有不同的查询过滤器,这一事实确实无法解决.如果您需要支持正确的电子商务方面行为",您将有复杂的查询 :)

There is really no way around the fact that you need to apply different filters to different facets and at the same time have different query filters. If you need to support "correct e-commerce facet behavior" you will have complex query :)

免责声明:我是上述文章的合著者.

Disclaimer: I'm coauthor of the mentioned article.

这篇关于Elasticsearch - 通用构面结构 - 结合过滤器计算聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆