PostgreSQL优化包含CTE的Window函数的查询性能 [英] PostgreSQL optimize query performance that contains Window function with CTE

查看:60
本文介绍了PostgreSQL优化包含CTE的Window函数的查询性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此处列 amenity_category parent_path JSONB 列,其值类似于["Tv","Air Condition"]和[分别为"200000","20100","203"].除此之外,其他列都是普通的 varchar numeric 类型.我大约有 250万行,其主键位于 id 上,并且已对其进行了索引.基本上, rp.parent_path 匹配多行时,最初的 CTE 部分会花费一些时间.

Here the column amenity_category and parent_path is JSONB column with value like ["Tv","Air Condition"] and ["20000","20100","203"] respectively. Apart from that other columns are normal varchar and numeric type. I've around 2.5M rows with primary key on id and it is indexed. Basically the initial CTE part is taking time when rp.parent_path match multiple rows.

样本数据集:

当前查询:

WITH CTE AS
(
  SELECT id,
  property_name,
  property_type_category,
  review_score, 
  amenity_category.name, 
  count(*) AS cnt FROM table_name rp, 
  jsonb_array_elements_text(rp.amenity_categories) amenity_category(name)
  WHERE rp.parent_path ? '203' AND number_of_review >= 1
  GROUP BY amenity_category.name,id 
),
CTE2 as
(
  SELECT id, property_name,property_type_category,name,
  ROW_NUMBER() OVER (PARTITION BY property_type_category,
  name ORDER BY review_score DESC),
  COUNT(id) OVER (PARTITION BY property_type_category,
  name ORDER BY name DESC) 
  FROM CTE
)

SELECT id, property_name, property_type_category, name, COUNT 
FROM CTE2
where row_number = 1

当前输出:

所以我的基本问题是,我还有其他方法可以重新编写此查询或优化当前查询吗?

推荐答案

如果可以安全地假设 amenity_categories 中的数组元素是不同的(没有重复的数组元素),我们可以从根本上简化为:/p>

If it's safe to assume that array elements in amenity_categories are distinct (no duplicate array elements), we can radically simplify to:

SELECT DISTINCT ON (property_type_category, ac.name)
       id, property_name, property_type_category, ac.name
     , COUNT(*) OVER (PARTITION BY property_type_category, ac.name) AS count
FROM   table_name rp, jsonb_array_elements_text(rp.amenity_categories) ac(name)
WHERE  parent_path ? '203'
AND    number_of_review >= 1
ORDER  BY property_type_category, ac.name, review_score DESC;

如果 review_score 可以为NULL,请执行以下操作:

If review_score can be NULL, make that:

...
ORDER  BY property_type_category, ac.name, review_score DESC NULLS LAST;

之所以可行,是因为将 DISTINCT ON 应用于最后一步(在窗口功能之后).参见:

This works, because DISTINCT ON is applied as last step (after window functions). See:

parent_path number_of_review 应该应该被索引.取决于数据分布和 WHERE 条件的选择性,您没有透露.

parent_path and number_of_review should probably be indexed. Depends on data distribution and selectivity of the WHERE conditions, which you didn't disclose.

关于 DISTINCT ON :

假设 id NOT NULL count(*)更快并且等效于 count(id)

Assuming id is NOT NULL, count(*) is faster and equivalent to count(id).

这篇关于PostgreSQL优化包含CTE的Window函数的查询性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆