如何在 Redshift 中创建日期表? [英] How do I create a dates table in Redshift?

查看:23
本文介绍了如何在 Redshift 中创建日期表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

tl;dr:我想在 Redshift 中生成一个日期表,以便更容易生成报告.最好不需要红移中已有的大表,需要上传 csv 文件.

tl;dr: I want to generate a dates table in Redshift in order to make a report easier to generate. Preferable without needing large tables already in redshift, needing to upload a csv file.

长版:我正在编写一份报告,我必须对一周中每天创建的新项目进行平均.日期范围可能跨越数月或更长时间,因此可能有 5 个星期一但只有 4 个星期日,这会使计算变得有点棘手.此外,我不能保证每天有单个项目的实例,尤其是当用户开始切片数据时.哪个,这是绊倒 BI 工具.

long version: I am working on a report where I have to average new items created per day of the week. The date range could span months or more, so there could be, say, 5 Mondays but only 4 Sundays, which can make the math a little tricky. Also, I am not guaranteed an instance of a single item per day, especially once a user starts slicing the data. Which, this is tripping up the BI tool.

解决此问题的最佳方法很可能是日期表.但是,大多数日期表教程都使用 Redshift 不可用或不完全支持的 SQL 命令(我在看着你,generate_series).

The best way to tackle this problem is most likely a dates table. However, most of the tutorials for dates tables use SQL commands that are not available or not fully supported by Redshift (I'm looking at you, generate_series).

有没有一种在 Redshift 中生成日期表的简单方法?

我试图使用的代码:(基于这个同样无效的建议:http://elliot.land/post/building-a-date-dimension-table-in-redshift )

The code I was attempting to use: (based on this also-not-working recommendation: http://elliot.land/post/building-a-date-dimension-table-in-redshift )

CREATE TABLE facts.dates (
  "date_id"              INTEGER                     NOT NULL PRIMARY KEY,

  -- DATE
  "full_date"            DATE                        NOT NULL,

  -- YEAR
  "year_number"          SMALLINT                    NOT NULL,
  "year_week_number"     SMALLINT                    NOT NULL,
  "year_day_number"      SMALLINT                    NOT NULL,

  -- QUARTER
  "qtr_number"           SMALLINT                    NOT NULL,

  -- MONTH
  "month_number"         SMALLINT                    NOT NULL,
  "month_name"           CHAR(9)                     NOT NULL,
  "month_day_number"     SMALLINT                    NOT NULL,

  -- WEEK
  "week_day_number"      SMALLINT                    NOT NULL,

  -- DAY
  "day_name"             CHAR(9)                     NOT NULL,
  "day_is_weekday"       SMALLINT                    NOT NULL,
  "day_is_last_of_month" SMALLINT                    NOT NULL
) DISTSTYLE ALL SORTKEY (date_id)
;


INSERT INTO facts.dates
(
   "date_id"
  ,"full_date"
  ,"year_number"
  ,"year_week_number"
  ,"year_day_number"

  -- QUARTER
  ,"qtr_number"

  -- MONTH
  ,"month_number"
  ,"month_name"
  ,"month_day_number"

  -- WEEK
  ,"week_day_number"

  -- DAY
  ,"day_name"
  ,"day_is_weekday"
  ,"day_is_last_of_month"
)
  SELECT
    cast(seq + 1 AS INTEGER)                                      AS date_id,

    -- DATE
    datum                                                         AS full_date,

    -- YEAR
    cast(extract(YEAR FROM datum) AS SMALLINT)                    AS year_number,
    cast(extract(WEEK FROM datum) AS SMALLINT)                    AS year_week_number,
    cast(extract(DOY FROM datum) AS SMALLINT)                     AS year_day_number,

    -- QUARTER
    cast(to_char(datum, 'Q') AS SMALLINT)                         AS qtr_number,

    -- MONTH
    cast(extract(MONTH FROM datum) AS SMALLINT)                   AS month_number,
    to_char(datum, 'Month')                                       AS month_name,
    cast(extract(DAY FROM datum) AS SMALLINT)                     AS month_day_number,

    -- WEEK
    cast(to_char(datum, 'D') AS SMALLINT)                         AS week_day_number,

    -- DAY
    to_char(datum, 'Day')                                         AS day_name,
    CASE WHEN to_char(datum, 'D') IN ('1', '7')
      THEN 0
    ELSE 1 END                                                    AS day_is_weekday,
    CASE WHEN
      extract(DAY FROM (datum + (1 - extract(DAY FROM datum)) :: INTEGER +
                        INTERVAL '1' MONTH) :: DATE -
                       INTERVAL '1' DAY) = extract(DAY FROM datum)
      THEN 1
    ELSE 0 END                                                    AS day_is_last_of_month
  FROM
    -- Generate days for 81 years starting from 2000.
    (
      SELECT
        '2000-01-01' :: DATE + generate_series AS datum,
        generate_series                        AS seq
      FROM generate_series(0,81 * 365 + 20,1)
    ) DQ
  ORDER BY 1;

抛出这个错误

[Amazon](500310) Invalid operation: Specified types or functions (one per INFO message) not supported on Redshift tables.;
1 statement failed.

... 因为,我认为,在 Redshift 中的同一命令中不允许 INSERT 和 generate_series

... because, I assume, INSERT and generate_series are not allowed in the same command in Redshift

推荐答案

在问这个问题时,我想通了.糟糕.

In asking the question, I figured it out. Oops.

我从事实"模式开始.

CREATE SCHEMA facts;

运行以下命令以启动数字表:

Run the following to start a numbers table:

create table facts.numbers
(
  number int PRIMARY KEY
)
;

使用它来生成您的号码列表.我用了一百万开始

Use this to generate your number list. I used a million to get started

SELECT ',(' || generate_series(0,1000000,1) || ')'
;

然后将结果中的数字复制粘贴到以下查询中的 VALUES 之后:

Then copy-paste the numbers from your results in the query below, after VALUES:

INSERT INTO facts.numbers
VALUES
 (0)
,(1)
,(2)
,(3)
,(4)
,(5)
,(6)
,(7)
,(8)
,(9)
-- etc

^ 确保从复制粘贴的数字列表中删除前导逗号

^ Make sure to remove the leading comma from the copy-pasted list of numbers

一旦你有了一个数字表,你就可以生成一个日期表(同样,从 elliot land http://elliot.land/post/building-a-date-dimension-table-in-redshift ):

Once you have a numbers table, then you can generate a dates table (again, stealing code from elliot land http://elliot.land/post/building-a-date-dimension-table-in-redshift ) :

CREATE TABLE facts.dates (
  "date_id"              INTEGER                     NOT NULL PRIMARY KEY,

  -- DATE
  "full_date"            DATE                        NOT NULL,

  -- YEAR
  "year_number"          SMALLINT                    NOT NULL,
  "year_week_number"     SMALLINT                    NOT NULL,
  "year_day_number"      SMALLINT                    NOT NULL,

  -- QUARTER
  "qtr_number"           SMALLINT                    NOT NULL,

  -- MONTH
  "month_number"         SMALLINT                    NOT NULL,
  "month_name"           CHAR(9)                     NOT NULL,
  "month_day_number"     SMALLINT                    NOT NULL,

  -- WEEK
  "week_day_number"      SMALLINT                    NOT NULL,

  -- DAY
  "day_name"             CHAR(9)                     NOT NULL,
  "day_is_weekday"       SMALLINT                    NOT NULL,
  "day_is_last_of_month" SMALLINT                    NOT NULL
) DISTSTYLE ALL SORTKEY (date_id)
;


INSERT INTO facts.dates
(
   "date_id"
  ,"full_date"
  ,"year_number"
  ,"year_week_number"
  ,"year_day_number"

  -- QUARTER
  ,"qtr_number"

  -- MONTH
  ,"month_number"
  ,"month_name"
  ,"month_day_number"

  -- WEEK
  ,"week_day_number"

  -- DAY
  ,"day_name"
  ,"day_is_weekday"
  ,"day_is_last_of_month"
)
  SELECT
    cast(seq + 1 AS INTEGER)                                      AS date_id,

    -- DATE
    datum                                                         AS full_date,

    -- YEAR
    cast(extract(YEAR FROM datum) AS SMALLINT)                    AS year_number,
    cast(extract(WEEK FROM datum) AS SMALLINT)                    AS year_week_number,
    cast(extract(DOY FROM datum) AS SMALLINT)                     AS year_day_number,

    -- QUARTER
    cast(to_char(datum, 'Q') AS SMALLINT)                         AS qtr_number,

    -- MONTH
    cast(extract(MONTH FROM datum) AS SMALLINT)                   AS month_number,
    to_char(datum, 'Month')                                       AS month_name,
    cast(extract(DAY FROM datum) AS SMALLINT)                     AS month_day_number,

    -- WEEK
    cast(to_char(datum, 'D') AS SMALLINT)                         AS week_day_number,

    -- DAY
    to_char(datum, 'Day')                                         AS day_name,
    CASE WHEN to_char(datum, 'D') IN ('1', '7')
      THEN 0
    ELSE 1 END                                                    AS day_is_weekday,
    CASE WHEN
      extract(DAY FROM (datum + (1 - extract(DAY FROM datum)) :: INTEGER +
                        INTERVAL '1' MONTH) :: DATE -
                       INTERVAL '1' DAY) = extract(DAY FROM datum)
      THEN 1
    ELSE 0 END                                                    AS day_is_last_of_month
  FROM
    -- Generate days for 81 years starting from 2000.
    (
      SELECT
        '2000-01-01' :: DATE + number AS datum,
        number                        AS seq
      FROM facts.numbers
      WHERE number between 0 and 81 * 365 + 20
    ) DQ
  ORDER BY 1;

^ 请务必在末尾设置您需要的日期范围的数字

^ Be sure to set the numbers at the end for the date range you need

这篇关于如何在 Redshift 中创建日期表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆