如何在Amazon redshift中编写此postgres查询,使其与postgres一样优化? [英] How can I write this postgres query in Amazon redshift such that it is as optimized as it was in postgres?

查看:83
本文介绍了如何在Amazon redshift中编写此postgres查询,使其与postgres一样优化?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我在postgres中使用的原始查询-

Here is my original query that I was using in postgres -

SELECT a.id,
    (SELECT val
       FROM database.detail x
      WHERE name = 'blablah'
        AND x.id = b.id) AS myGroup,
    c.username,
    a.someCode,
    a.timeTaken,
    a.date ::timestamp WITH time ZONE AT time ZONE 'PST' AS date,
    SUM (CASE WHEN (b.name = 'name1') THEN b.val ::INTEGER ELSE 0 END ) AS name11,
    SUM (CASE WHEN (b.name = 'name2') THEN b.val ::INTEGER ELSE 0 END ) AS name12
FROM
    database.myTable a,
    database.detail b,
    database.client c
WHERE
    a.id = b.id
    AND a.c_id = c.c_id
    AND a.date > current_date - interval '2 weeks'
GROUP BY 1, 2, 3, 4, 5, 6

以下是我如何将此查询转换为Amazon redshift查询。

Following is how I converted this query into Amazon redshift query.

SELECT a.id,
    b.val AS myGroup,
    c.username,
    a.someCode,
    a.timeTaken,
    convert_timezone('PST', a.date) AS date,
    SUM (CASE WHEN (b.name = 'name1') THEN b.val ::INTEGER ELSE 0 END ) AS name11,
    SUM (CASE WHEN (b.name = 'name2') THEN b.val ::INTEGER ELSE 0 END ) AS name12
FROM
    database.myTable a,
    database.detail b,
    database.client c
WHERE
    a.id = b.id
    AND b.name = 'blablah'
    AND a.c_id = c.c_id
    AND a.date > current_date - interval '2 weeks'
GROUP BY 1, 2, 3, 4, 5, 6 LIMIT 10

CASE语句似乎没有按照预期的方式执行,name11和name12的值基本上都是零。我的postgres查询返回了这些值的有效值,但redshift查询却没有。

The CASE statement does not seem to be executing the way it is expected, basically the values for name11 and name12 are all zero. My postgres query returns valid values for these but the redshift query does not.

此外,此查询非常慢。 Postgres查询大约需要150毫秒,而此查询则需要2分钟。

Also, this query is very very slow. Postgres query takes some 150 ms and this query takes 2 mins.

我们如何更好地做到这一点?

How can we do this better?

推荐答案

Redshift查询优化来自表的群集,表设计,数据加载,数据清理和分析。

Redshift Query optimization comes from Cluster, Table Design, DataLoading, Data Vacuuming &Analyzing over the table.

让我回答上面列表中的一些核心接触点。
1.确保您的表具有可变表,详细信息,客户端具有正确的SORT_KEY,DIST_KEY
2.确保您的所有联接表均已正确分析和填充。

Let me answer some core touch points in the above list. 1. Make Sure your table mytable, detail, client has proper SORT_KEY, DIST_KEY 2. Make Sure all your tables in join are analzed and vaccumed properly.

这是用Redshift格式编写的同一SQL的另一个版本。

Here is another version of your same SQL written in Redshift format.

我所做的一些调整


  1. 使用带有子句来优化群集级别的计算

  2. 使用联接正确的方式,并确保左/右联接对
    的影响基于数据

  3. 使用date_range和子句表来表示面向对象的类型。

  4. 在下面的主SQL中使用分组依据。

  1. Used "With Clause" to Optimized Cluster level computation
  2. Used Joins the proper way and make sure left/right join matters based on data.
  3. Used date_range with clause table for kind of object orientation.
  4. Used Group By in the main SQL below.

我的Redshift SQL版本

/** Date Range Computation **/
with date_range as (
    select ( current_Date - interval '2 weeks' ) as two_weeks
),
/** Filter main ResultSet**/
myGroupSet as (
    SELECT b.val AS myGroup,
           c.username,
           a.someCode,
           a.timeTaken,
           (case when (b.name == 'name1') THEN b.val::INTEGER ELSE 0 END ) as name11,
           (case when (b.name == 'name2') THEN b.val::INTEGER ELSE 0 END ) as name12
      FROM database.myTable a,
      join date_range dr on a.date > dr.two_weeks
      join database.detail b on b.id = a.id
      join database.client c on c.c_id = a.c_id
     where a.date > current_Date - interval '2 weeks'
)
/** Apply Aggregation **/
select myGroup, username, someCode, timeTaken, date,
       sum(name1), sum(name2)
  from myGroupSet
  group by myGroup, username, someCode, timeTaken, date

这篇关于如何在Amazon redshift中编写此postgres查询,使其与postgres一样优化?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆