PostgreSQL中的优化查询 [英] Optimized querying in PostgreSQL

查看：214 发布时间：2020/5/30 1:10:15 postgresql query-optimization greatest-n-per-group postgresql-8.0

本文介绍了PostgreSQL中的优化查询的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设您有一个名为tracker的表，其中包含以下记录。

  issue_id | ingest_date |动词，状态
 10 2015-01-24 00:00:00 1,1 
 10 2015-01-25 00:00:00 2,2 
 10 2015-01-26 00 ：00：00 2,3 
 10 2015-01-27 00:00:00 3,4 
 11 2015-01-10 00:00:00 1,3 
 11 2015- 01-11 00:00:00 2,4

我需要以下结果

  10 2015-01-26 00:00:00 2,3 
 11 2015-01-11 00:00:00 2， 4

我正在尝试此查询

  select * 
 from etl_change_fact 
其中ingest_date =（从etl_change_fact中选择max（ingest_date）
）;

但是，这只给我

  10 2015-01-26 00:00:00 2,3

此记录。

但是，我希望所有具有

（a）max的唯一记录（change_id）（ingest_date）AND

（b）动词列优先级为（2-第一个首选，1-第二个首选，3-最后一个首选）

因此，我需要以下结果

  10 2015-01-26 00:00:00 2 ，3 
 11 2015-01-11 00:00:00 2,4

请帮助我高效地查询它。

PS：
我不为ingest_date编制索引，因为我将在Distributed Computing设置中将其设置为 distribution key 。
我是Data Warehouse和查询的新手。

因此，请以优化的方式帮助我达到TB大小的数据库。

解决方案

这是一个典型的最大组问题。如果您在此处搜索此标签，则将获得很多解决方案-包括MySQL。

对于Postgres，最快的方法是使用在（这是对SQL语言的Postgres专有扩展）上

  select on on（issue_id ）issue_id，ingest_date，动词，状态
从etl_change_fact 
按issue_id，
顺序动词
在2下然后1 
在1下然后2 
否则3 
结尾，ingest_date说明；

您可以增强原始查询以使用共同相关的子查询来实现相同的目的：

 选择f1。* 
 from etl_change_fact f1 
其中f1.ingest_date =（选择max（f2。 ingest_date）来自etl_change_fact f2 
的
，其中f1.issue_id = f2.issue_id）；

编辑

对于过时且不受支持的Postgres版本，您可能可以使用以下方法逃脱：

  select f1。* 
 from etl_change_fact f1 
其中f1.ingest_date =（选择f2.ingest_date 
 from etl_change_fact f2 
其中f1.issue_id = f2.issue_id 
时按动词
排序，然后2然后1 
当1然后2 
否则3 
结尾，ingest_date desc 
限制1）;

SQLFiddle示例： http://sqlfiddle.com/#!15/3bb05/1

Assume you have a table named tracker with following records.

issue_id  |  ingest_date         |  verb,status
10         2015-01-24 00:00:00    1,1
10         2015-01-25 00:00:00    2,2
10         2015-01-26 00:00:00    2,3
10         2015-01-27 00:00:00    3,4
11         2015-01-10 00:00:00    1,3
11         2015-01-11 00:00:00    2,4

I need the following results

10         2015-01-26 00:00:00    2,3
11         2015-01-11 00:00:00    2,4

I am trying out this query

select * 
from etl_change_fact 
where ingest_date = (select max(ingest_date) 
                     from etl_change_fact);

However, this gives me only

10    2015-01-26 00:00:00    2,3

this record.

But, I want all unique records(change_id) with

(a) max(ingest_date) AND

(b) verb columns priority being (2 - First preferred ,1 - Second preferred ,3 - last preferred)

Hence, I need the following results

10    2015-01-26 00:00:00    2,3
11    2015-01-11 00:00:00    2,4

Please help me to efficiently query it.

P.S : I am not to index ingest_date because I am going to set it as "distribution key" in Distributed Computing setup. I am newbie to Data Warehouse and querying.

Hence, please help me with optimized way to hit my TB sized DB.

解决方案

This is a typical "greatest-n-per-group" problem. If you search for this tag here, you'll get plenty of solutions - including MySQL.

For Postgres the quickest way to do it is using distinct on (which is a Postgres proprietary extension to the SQL language)

select distinct on (issue_id) issue_id, ingest_date, verb, status
from etl_change_fact
order by issue_id, 
         case verb 
            when 2 then 1 
            when 1 then 2
            else 3
         end, ingest_date desc;

You can enhance your original query to use a co-related sub-query to achieve the same thing:

select f1.* 
from etl_change_fact f1
where f1.ingest_date = (select max(f2.ingest_date) 
                        from etl_change_fact f2
                        where f1.issue_id = f2.issue_id);

Edit

For an outdated and unsupported Postgres version, you can probably get away using something like this:

select f1.* 
from etl_change_fact f1
where f1.ingest_date = (select f2.ingest_date
                        from etl_change_fact f2
                        where f1.issue_id = f2.issue_id
                        order by case verb 
                                  when 2 then 1 
                                  when 1 then 2
                                  else 3
                              end, ingest_date desc
                        limit 1);

SQLFiddle example: http://sqlfiddle.com/#!15/3bb05/1

这篇关于PostgreSQL中的优化查询的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

PostgreSQL中的优化查询 [英] Optimized querying in PostgreSQL

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

PostgreSQL中的优化查询 [英] Optimized querying in PostgreSQL

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭