Redissearch 总回报每组前 5 名 [英] Redisearch aggregate return top 5 of each group

查看:38
本文介绍了Redissearch 总回报每组前 5 名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有这种格式的文档:

Suppose I have documents in this format:

product_name TEXT tags TAG score NUMERIC 

[product1, [tag1, tag2, tag3], 10]
[product2, [tag2, tag3, tag4], 100]
....

我想要一个查询,按照产品得分最高总和的顺序返回标签,以及每个标签的前 5 个产品:

I want a query to return the tags in the order of the highest sum of product score and also the top 5 of the products for each tag:

[tag3, 110, [product2, product 1]]
[tag2, 110, [product2, product 1]]
[tag4, 100, [product2]]
[tag1, 10, [product 1]]

到目前为止我所拥有的是分别存储每个产品/标签键(对每个标签重复),因此对于每个产品,我们为每个标签都有一个单独的文档,ID 是产品名称和标签的组合:product_name TEXT标记 TAG 分数 NUMERIC.现在我可以运行一个聚合查询来获取顶级标签的列表:

What I have so far is storing each product/tag key separately (repeated for each tag) so for each product we have one separate doc for each tag and the id is combination of product name and tag: product_name TEXT tag TAG score NUMERIC. Now I can run an aggregate query to get the list of the top tags:

FT.AGGREGATE product_tags * 
   GROUP BY 1 @TAG 
     REDUCE SUM 1 @score as total_score
   SORT BY 2 @total_score DESC

这将按顺序为我提供顶部标签,但如果我想为每个标签获取前 5 个产品,我发现只有 REDUCE TOLIST 1 @product_name 它将返回所有未排序的产品,并且有 REDUCE FIRST_VALUE 4 @product_name BY @score DESC 将只返回第一个顶级产品.

This will give me the top tags in order but if I want to get top 5 products for each tag I found there is only REDUCE TOLIST 1 @product_name which will return all the products not sorted and there is REDUCE FIRST_VALUE 4 @product_name BY @score DESC which will return only the first top product.

有什么方法可以让我们在一个查询中为每个标签获得 5 个顶级产品.如果不是,是否可以通过某种方式更改文档存储格式(或添加其他格式)以使此类查询成为可能或尽可能少地查询?

Is there any way to get let's say 5 top products for each tag in one query. If not is it possible to change the document storage format (or add additional one) in a way to make this kind of query possible or with as little queries as possible?

没关系,但我使用的是 python Redisearch 客户端.

Shouldn't matter but I am using python Redisearch client.

推荐答案

第一:

  • 确保禁用您不会使用的功能(NOOFFSETSNOHLNOFREQS, STOPWORDS 0)
  • 使用 SORTABLE 作为您的 NUMERIC score.
  • Make sure to disable features you won't use (NOOFFSETS, NOHL, NOFREQS, STOPWORDS 0)
  • Use SORTABLE for your NUMERIC score.

这是我用来测试的架构:

Here is the schema I used to test:

FT.CREATE product_tags NOOFFSETS NOHL NOFREQS STOPWORDS 0
    SCHEMA product_name TEXT tags TAG score NUMERIC SORTABLE

您想将 FT.AGGREGATE 视为管道.

You want to think of FT.AGGREGATE as a pipeline.

第一步是按@score 对产品进行排序,以便稍后在管道中,当我们REDUCE TOLIST 1 @product_name 时,列表会排序:

The first step will be to sort the products by @score, so that later, down in the pipeline, when we REDUCE TOLIST 1 @product_name, the list comes out sorted:

SORTBY 2 @score DESC

我认为您已经在执行 LOAD/APPLY 来处理标签,否则 TAG 字段将按完整逗号分组- 分隔的字符串标签列表,每个产品.请参阅在标签字段问题上允许 GROUPBY.所以我们的下一步是:

I think you are already doing LOAD/APPLY to deal with the tags, as TAG fields would otherwise be grouped by the full comma-separated string tags-list, per product. See Allow GROUPBY on tag fields issue. So our next step is in the pipeline is:

LOAD 1 @tags 
APPLY split(@tags) as TAG 

然后我们按@TAG 分组,并应用两次缩减.我们的产品列表会排序出来.

We then group by @TAG, and apply the two reductions. Our products list will come out sorted.

GROUPBY 1 @TAG
    REDUCE SUM 1 @score AS total_score
    REDUCE TOLIST 1 @product_name AS products

最后,我们按@total_score排序:

SORTBY 2 @total_score DESC

这里是命令的最终视图:

Here a final view of the command:

FT.AGGREGATE product_tags *
    SORTBY 2 @score DESC 
    LOAD 1 @tags 
    APPLY split(@tags) as TAG
    GROUPBY 1 @TAG
        REDUCE SUM 1 @score AS total_score 
        REDUCE TOLIST 1 @product_name AS products
    SORTBY 2 @total_score DESC

这里有完整的命令列表来说明结果.我使用 productXX 和 score XX 来轻松直观地验证产品的排序.

Here a full list of commands to illustrate the result. I used productXX with score XX to easily verify visually the sorting of products.

> FT.CREATE product_tags NOOFFSETS NOHL NOFREQS STOPWORDS 0 SCHEMA product_name TEXT tags TAG score NUMERIC SORTABLE
OK
> FT.ADD product_tags pt:product10 1 FIELDS product_name product10 tags tag2,tag3,tag4 score 10
OK
> FT.ADD product_tags pt:product1 1 FIELDS product_name product1  tags tag1,tag2,tag3 score 1
OK
> FT.ADD product_tags pt:product100 1 FIELDS product_name product100 tags tag2,tag3 score 100
OK
> FT.ADD product_tags pt:product5 1 FIELDS product_name product5 tags tag1,tag4 score 5
OK
> FT.SEARCH product_tags *
1) (integer) 4
2) "pt:product5"
3) 1) "product_name"
   2) "product5"
   3) "tags"
   4) "tag1,tag4"
   5) "score"
   6) "5"
4) "pt:product100"
5) 1) "product_name"
   2) "product100"
   3) "tags"
   4) "tag2,tag3"
   5) "score"
   6) "100"
6) "pt:product1"
7) 1) "product_name"
   2) "product1"
   3) "tags"
   4) "tag1,tag2,tag3"
   5) "score"
   6) "1"
8) "pt:product10"
9) 1) "product_name"
   2) "product10"
   3) "tags"
   4) "tag2,tag3,tag4"
   5) "score"
   6) "10"
> FT.AGGREGATE product_tags * SORTBY 2 @score DESC LOAD 1 @tags APPLY split(@tags) as TAG GROUPBY 1 @TAG REDUCE SUM 1 @score AS total_score REDUCE TOLIST 1 @product_name AS products SORTBY 2 @total_score DESC
1) (integer) 4
2) 1) "TAG"
   2) "tag2"
   3) "total_score"
   4) "111"
   5) "products"
   6) 1) "product100"
      2) "product10"
      3) "product1"
3) 1) "TAG"
   2) "tag3"
   3) "total_score"
   4) "111"
   5) "products"
   6) 1) "product100"
      2) "product10"
      3) "product1"
4) 1) "TAG"
   2) "tag4"
   3) "total_score"
   4) "15"
   5) "products"
   6) 1) "product10"
      2) "product5"
5) 1) "TAG"
   2) "tag1"
   3) "total_score"
   4) "6"
   5) "products"
   6) 1) "product5"
      2) "product1"

您将获得已排序的完整产品列表,而不仅仅是前 5 名.复杂性方面没有区别,我们付出了代价.影响在于缓冲、网络负载和您的客户端.

You are getting the full list of products sorted, not just the top 5. Complexity-wise it makes no difference, we paid the price. The impact is in buffering, network payload, and your client.

您可以使用 Lua 脚本限制在前 5 名:

You can limit to top 5 using a Lua script:

eval "local arr = redis.call('FT.AGGREGATE', KEYS[1], '*', 'SORTBY', '2', '@score', 'DESC', 'LOAD', '1', '@tags', 'APPLY', 'split(@tags)', 'as', 'TAG', 'GROUPBY', '1', '@TAG', 'REDUCE', 'SUM', '1', '@score', 'AS', 'total_score', 'REDUCE', 'TOLIST', '1', '@product_name', 'AS', 'products', 'SORTBY', '2', '@total_score', 'DESC') \n for i=2,(arr[1]+1) do \n arr[i][6] = {unpack(arr[i][6], 1, ARGV[1])} \n end \n return arr" 1 product_tags 5

这是上面 Lua 脚本的友好视图:

Here a friendly view of the Lua script above:

local arr = redis.call('FT.AGGREGATE', KEYS[1], ..., 'DESC')
for i=2,(arr[1]+1) do 
    arr[i][6] = {unpack(arr[i][6], 1, ARGV[1])}
end
return arr

我们传递一个键(索引)和一个参数(顶级产品的限制,在您的情况下为 5):1 product_tags 3.

We are passing one key (the index) and one argument (the limit for top products, 5 in your case): 1 product_tags 3.

因此,我们将影响限制在仅缓冲、保存的网络有效负载和客户端负载上.

With this, we limited the impact to buffering only, saved network payload and load on your client.

这篇关于Redissearch 总回报每组前 5 名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆