继续使用现有表,直到月底为止包含预测数据并每天更新 [英] Continue existing table until end of month with forecasted data and update daily

查看:37
本文介绍了继续使用现有表,直到月底为止包含预测数据并每天更新的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在Google Big Query中使用现有的每日收入数据创建一个新表,并使用基于现有数据并需要创建的预测数据扩展该新表.一旦存在某天的新实际数据,它将覆盖该天的预测数据.此外,直到月底的预测数据都会再次更新.

I'd like to create a new table in Google Big Query with existing daily revenue data and extend this new table with forecast data which is based on the existing data and needs to be created. Once new actual data exists for a certain day it overrides the forecast data for that day. Also, the forecast data until the end of the month is then updated again.

到目前为止,我想出了以下内容,它会生成一条错误消息:Scalar subquery produced more than one element

So far, I came up with the following, which generates an error message: Scalar subquery produced more than one element

    SELECT
        date, sum(yl_revenue), 'ACTUAL' as type 
        from project.dataset.table 
        where date >"2020-01-01" and date < current_date() 
        group by date 
        union distinct

        SELECT 
        (select calendar_date 
    FROM 
UNNEST(GENERATE_DATE_ARRAY('2020-01-01', DATE_SUB(DATE_TRUNC(DATE_ADD(CURRENT_DATE(), INTERVAL 1 MONTH), MONTH), INTERVAL 1 DAY), INTERVAL 1 DAY)) 
AS calendar_date), 
        avg(revenue_daily) as average_daily_revenue, 
        'FORECAST' as type FROM 
            (SELECT sum(revenue) as revenue_daily from project.dataset.table 
    WHERE date > "2020-01-01" and extract(month from date) = extract (month from current_date()) group by date)

我希望数据看起来如何:

How I wish the data looks like:

+------------+------------+----------+
|    date    |  revenue   |   type   |
+------------+------------+----------+
| 01.04.2020 | 100 €      | ACTUAL   |
| …          | 5.000 €    | ACTUAL   |
| 23.04.2020 | 200 €      | ACTUAL   |
| 24.04.2020 |  230,43 €  | FORECAST |
| 25.04.2020 |  230,43 €  | FORECAST |
| 26.04.2020 |  230,43 €  | FORECAST |
| 27.04.2020 |  230,43 €  | FORECAST |
| 28.04.2020 |  230,43 €  | FORECAST |
| 29.04.2020 |  230,43 €  | FORECAST |
| 30.04.2020 |  230,43 €  | FORECAST |
+------------+------------+----------+

第二天(2020年4月24日)应该如下所示:

On the next day (24.04.2020) it should look like this:

+------------+--------------+----------+
|    date    |   revenue    |   type   |
+------------+--------------+----------+
| 01.04.2020 | 100 €        | ACTUAL   |
| …          | 5.000 €      | ACTUAL   |
| 23.04.2020 | 200 €        | ACTUAL   |
| 24.04.2020 |  1.000,00 €  | ACTUAL   | <----
| 25.04.2020 |  262,50 €    | FORECAST |
| 26.04.2020 |  262,50 €    | FORECAST |
| 27.04.2020 |  262,50 €    | FORECAST |
| 28.04.2020 |  262,50 €    | FORECAST |
| 29.04.2020 |  262,50 €    | FORECAST |
| 30.04.2020 |  262,50 €    | FORECAST |
+------------+--------------+----------+

预测值只是该月实际收入的总和除以该月到目前为止的天数.请注意,第二个表中的每日预测值已更改,因为已将新的实际值添加到其中.

The forecast value is simply the sum of the actual revenue of the month divided by the number of days the month had so far. Notice, that the daily forecast value changed in the second table as a new actual value was added to it.

感谢您提供有关如何解决此问题的帮助!

Any help on how to approach this is much appreciated!

谢谢

Jan

推荐答案

我找到了解决问题的方法. (尽管它可能不是最复杂的)

I found a solution to my problem. (Although it may not be the most sophisticated one)

我现在想出了3个新表:

I now came up with 3 new tables:

  1. 提供过去&未来的日期,这就是为什么我称其为日历"
  2. 提供当月的收入数据.我覆盖了这个 每天都有一个预定查询的表,它提供了实际的过去 数据和预测的未来数据(基于 月),直到当前月末.
  3. 提供过去的数据(可追溯到比当前月份更长的时间)以及每日数据 来自2的更新数据).我也为此使用预定的MERGE查询.
  1. provides past & future dates, which is why I called it 'calendar'
  2. provides revenue data for the current month. I overwrite this table every day with a scheduled query, which provides actual past data and forecasted future data (based on the actual data of the month) until the end of the current month.
  3. provides past data (dating back longer than just the current month), plus the daily updated data from 2). I use a scheduled MERGE query for this one, too.

以下是各个查询:

1)

SELECT
  *
FROM
  UNNEST(GENERATE_DATE_ARRAY('2018-01-01', '2030-12-31', INTERVAL 1 DAY)) AS calendar_date
WITH
OFFSET
  AS
OFFSET
ORDER BY
OFFSET

2)

SELECT
  date,
  'actual' AS type,
  ROUND(SUM(revenue),2)
FROM
  `project.dataset.revenue_data` 
WHERE
  EXTRACT(year
  FROM
    date) = EXTRACT (year
  FROM
    CURRENT_DATE())
  AND EXTRACT(month
  FROM
    date) = EXTRACT (month
  FROM
    CURRENT_DATE())
GROUP BY
  date
UNION DISTINCT
SELECT
  calendar_date,
  'forecast',
  (
  SELECT
    ROUND(AVG(revenue_daily),2)
  FROM (
    SELECT
      SUM(revenue) AS revenue_daily
    FROM
      `project.dataset.revenue_data`
    WHERE
      EXTRACT(year
      FROM
        date) = EXTRACT (year
      FROM
        CURRENT_DATE())
      AND EXTRACT(month
      FROM
        date) = EXTRACT (month
      FROM
        CURRENT_DATE())
    GROUP BY
      date
    ORDER BY
      date) AS average_daily_revenue),
FROM
  `project.dataset.calendar`
WHERE
  calendar_date >= CURRENT_DATE()
  AND calendar_date <=DATE_SUB(DATE_TRUNC(DATE_ADD(CURRENT_DATE(), INTERVAL 1 MONTH), MONTH), INTERVAL 1 DAY)
ORDER BY
  date

3)

MERGE
  `project.dataset.forecast_table` f
USING
  `project.dataset.forecast_month` m
ON
  f.date = m.date
  WHEN MATCHED THEN UPDATE SET f.type = m.type, f.revenue = m.revenue
  WHEN NOT MATCHED
  AND m.date >= CURRENT_DATE() THEN
INSERT
  (date,
    type,
    revenue)
VALUES
  (date, type, revenue)

这篇关于继续使用现有表,直到月底为止包含预测数据并每天更新的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆