查询产品目录 RavenDB 商店以获取任意产品集合的规格聚合 [英] Query product catalog RavenDB store for spec aggregate over arbitrary collection of products

查看：104 发布时间：2021/6/8 19:03:09 nosql ravendb faceted-search

本文介绍了查询产品目录 RavenDB 商店以获取任意产品集合的规格聚合的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是this 问题.

我有以下模型:

class Product {
  public string Id { get; set; }
  public string[] Specs { get; set; }
  public int CategoryId { get; set; }
}

Specs"数组存储由特殊字符连接的产品规范名称值对.例如，如果产品为蓝色，则规格字符串将为Color~Blue".以这种方式表示规格允许查询具有由查询指定的多个规格值的产品.我想支持两个主要查询:

The "Specs" array stores product specification name value pairs joined by a special character. For example if a product is colored blue the spec string would be "Color~Blue". Representing specs in this way allows querying for products having multiple spec values specified by a query. There are two principal queries that I would like to support:

获取给定类别中的所有产品.
获取给定类别中具有一组指定规格的所有产品.

这适用于 RavenDB.但是，除了满足给定查询的产品之外，我还想返回一个结果集，其中包含查询指定的产品集的所有规范名称-值对.规范名称-值对应按规范的名称和值分组，并包含具有给定规范名称-值对的产品计数.对于查询 #1，我创建了以下地图缩减索引:

This works well with RavenDB. However, in addition to the products satisfying a given query I would like to return a result set which contains all spec name-value pairs for the set of products specified by the query. The spec name-value pairs should be grouped by the name and value of the spec and contain a count of products which have a given spec name-value pair. For query #1 I created the following map reduce index:

class CategorySpecGroups {
    public int CategoryId { get; set; }
    public string Spec { get; set; }
    public int Count { get; set; }
}


public class SpecGroups_ByCategoryId : AbstractIndexCreationTask<Product, CategorySpecGroups>
{
    public SpecGroups_ByCategoryId()
    {
        this.Map = products => from product in products
                               where product.Specs != null
                               from spec in product.Specs
                               select new
                               {
                                   CategoryId = product.CategoryId,
                                   Spec = spec,
                                   Count = 1
                               };

        this.Reduce = results => from result in results
                                 group result by new { result.CategoryId, result.Spec } into g
                                 select new
                                 {
                                     CategoryId = g.Key.CategoryId,
                                     Spec = g.Key.Spec,
                                     Count = g.Sum(x => x.Count)
                                 };
    }
}

然后我可以查询此索引并获取给定类别中的所有规范名称-值对.我遇到的问题是获得相同的结果集，但对于按类别和一组规范名称-值对进行过滤的查询.使用 SQL 时，将通过对按类别和规格过滤的一组产品进行分组来获得此结果集.一般来说，这种类型的查询很昂贵，但是当按类别和规格过滤时，产品集通常很小，虽然不足以放入单个页面 - 它们可能包含多达 1000 种产品.作为参考，MongoDB 支持 group 方法，可用于实现相同的结果集.这执行了 ad hoc 分组服务器端，性能可以接受.

I can then query this index and get all spec name-value pairs in a given category. The problem I am running into is to get the same result set but for a query which filters both by a category and a set of spec name-value pairs. When using SQL this result set would be obtained by doing a group by over a set of products filtered by category and specs. In general, this type of query is expensive but when filtering by both category and specs the product sets are normally small, though not small enough to fit into a single page - they may contain up to 1000 products. For reference, MongoDB supports a group method which can be used to achieve the same result set. This performs the ad hoc grouping server side and the performance is acceptable.

如何使用 RavenDB 获取此类结果集?

How can I get this type of result set using RavenDB?

一种可能的解决方案是获取查询的所有产品并在内存中执行分组，另一种选择是创建上述 mapreduce 索引，尽管这样做的挑战是推导出所有可能的规范选择一个给定的类别，此外，这种类型的索引可能会爆炸式增长.

One possible solution is to get all the products for a query and perform the grouping in memory and another option is to create a mapreduce index as above, though the challenge with this would be deducing all possible spec selections that can be made for a given category and additionally, this type of index might explode in size.

举个例子，看看此紧固件类别页面.用户可以通过选择属性来过滤他们的选择.选择一个属性后，它会缩小产品的选择范围并显示新产品集中的属性.这种类型的交互通常称为分面搜索.

For an example, take a look at this fastener category page. The user can filter their selection by selecting attributes. When an attribute is selected it narrows the selection of products and displays the attributes within the new set of products. This type of interaction is typically called faceted search.

编辑

与此同时，我将尝试使用 Solr 的解决方案，因为它们支持分面搜索盒子里.

In the meantime, I will be attempting a solution using Solr as they support faceted search out of the box.

编辑 2

看来 RavenDB 也支持分面搜索(当然有道理，索引是Lucene 像 Solr 一样存储).我将对此进行探索并发布更新.

It appears that RavenDB also supports faceted search (which of course makes sense, indexes are stored by Lucene just like Solr). I will be exploring this and post updates.

编辑 3

RavenDB 分面搜索功能按预期工作.我为每个类别 ID 存储一个构面设置文档，用于计算给定类别内查询的构面.我现在遇到的问题是性能.对于具有 4500 个不同类别的 500k 产品的集合，导致 4500 个方面设置文档，按类别 id 查询在查询方面时需要大约 16 秒，在不查询方面时大约需要 0.05 秒.测试的特定类别包含大约 6k 个产品、23 个不同的方面和 2k 个不同的方面名称范围组合.查看FacetedQueryRunner中的代码后出现的方面查询将导致对每个方面名称-值组合进行 Lucene 查询以获取计数，以及对每个方面名称进行查询以获取术语.该实现的一个问题是，无论查询如何，它都会检索给定方面名称的所有不同术语，这在大多数情况下会显着减少方面的术语数量，从而减少 Lucene 查询的数量.此处提高性能的一种方法是为每个构面设置文档存储一个 MapReduce 计算结果集(如上所示)，然后在进一步按构面过滤时可以查询该结果集以获取所有不同的术语.但是整体性能可能仍然太慢.

The RavenDB faceted search functionality works as expected. I store a facet setup document for each category ID which is used to calculate facets for a query within a given category. The issue I am having now is performance. For a collection of 500k products with 4500 distinct categories resulting in 4500 facet setup documents a query by category id takes about 16 seconds when also querying for facets and about 0.05 seconds when not querying for facets. The particular category tested contains about 6k products, 23 distinct facets and 2k distinct facet name-range combinations. After looking at the code in FacetedQueryRunner it appears a facets query will result in a Lucene query for every facet name-value combination to get the counts, as a well as a query for each facet name to get the terms. One problem with the implementation is that it will retrieve all the distinct terms for a given facet name regardless of the query, which in most cases will significantly reduce the number of terms for a facet and therefore reduce the number of Lucene queries. One way to improve performance here would be to store a MapReduce computed result set (as shown above) for each facet setup document which could then be queried to get all the distinct terms when further filtering by facets. The overall performance however may still be too slow.

查询产品目录 RavenDB 商店以获取任意产品集合的规格聚合 [英] Query product catalog RavenDB store for spec aggregate over arbitrary collection of products

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

查询产品目录 RavenDB 商店以获取任意产品集合的规格聚合 [英] Query product catalog RavenDB store for spec aggregate over arbitrary collection of products

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭