BigQuery:是否可以在UDF内执行另一个查询? [英] BigQuery : is it possible to execute another query inside an UDF?

查看:152
本文介绍了BigQuery:是否可以在UDF内执行另一个查询?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一张表格,每天为每个独特用户记录一行,并在当天为该用户汇总一些统计信息,而且我需要生成一份报告,告诉我每天的信息。包括当天在过去30天内的唯一用户数。

例如。 8月31日,它将计算8月2日至8月31日的唯一用户数量。

  • 对于8月30日,它将计算8月1日至8月30日的独特用户数。

  • ...



  • 我看了一些相关的问题,但他们不是我所需要的 - 如果用户在过去30天内登录多天,他应该只计算一次,所以我不能只需对最近30天的DAU计数进行总和。

    用于滑动窗口的Bigquery SQL聚合



    用于28天滑动窗口聚合的BigQuery SQL(无需编写28行SQL语句)

    <到目前为止,我的想法是:/ b>


    • 编写一个简单的脚本,为每个相关日子执行一个单独的BigQuery

    • 编写一个BigQuery UDF,它将为从另一个查询中选择的每一天执行基本相同的查询



    但是我还没有发现任何有关如何在UDF中执行另一个BigQuery查询的例子,或者根本没有可能。

    解决方案


    我需要生成一份报告,告诉我每一天的不。

    以下应该做到这一点

      SELECT 
    calendar_day,
    EXACT_COUNT_DISTINCT(userID)AS unique_users
    FROM(
    SELECT calendar_day,userID
    FROM YourTable
    CROSS JOIN(
    SELECT DATE(DATE_ADD('2016-08-08',pos - 1,DAY))AS calendar_day
    FROM(
    SELECT ROW_NUMBER()OVER()作为pos,*
    FROM(FLATTEN((
    SELECT SPLIT(RPAD('',1 + DATEDIFF('2016-09-08','2016-08-08' ),'。'),'')AS h
    FROM(SELECT NULL)),h
    )))
    )AS日历
    WHERE DATEDIFF(calendar_day,dt)BETWEEN 0 AND 29

    GROUP BY calendar_day
    ORDER BY calendar_day DESC

    它假定YourTable有userID和dt字段(例如下面的例子)

      dt userID 
    2016-09- 08 1
    2016-09-08 2
    ...

    您可以控制:

    - 报告日期范围分别改变 2016-08-08 2016-09-08

    - 在 BETWEEN 0和29


    中更改 29

    I have a table that records a row for each unique user per day with some aggregated stats for that user on that day, and I need to produce a report that tells me for each day, the no. of unique users in the last 30 days including that day.

    eg.

    • for Aug 31st, it'll count the unique users from Aug 2nd to Aug 31st
    • for Aug 30th, it'll count the unique users from Aug 1st to Aug 30th
    • and so on...

    I've looked at some related questions but they aren't quite what I need - if a user logs in on multiple days in the last 30 days he should be counted only once, so I can't just sum the DAU count for the last 30 days.

    Bigquery SQL for sliding window aggregate

    BigQuery SQL for 28-day sliding window aggregate (without writing 28 lines of SQL)

    So far, my ideas are to either:

    • write a simple script that'll execute a separate BigQuery for each of the relevant days
    • write a BigQuery UDF that'll execute basically the same query for each day selected from another query

    but I've not found any examples on how to execute another BigQuery query inside an UDF, or if it's possible at all.

    解决方案

    I need to produce a report that tells me for each day, the no. of unique users in the last 30 days including that day.

    Below should do this

    SELECT
      calendar_day, 
      EXACT_COUNT_DISTINCT(userID) AS unique_users
    FROM (
      SELECT calendar_day, userID
      FROM YourTable
      CROSS JOIN (
        SELECT DATE(DATE_ADD('2016-08-08', pos - 1, "DAY")) AS calendar_day
        FROM (
             SELECT ROW_NUMBER() OVER() AS pos, *
             FROM (FLATTEN((
             SELECT SPLIT(RPAD('', 1 + DATEDIFF('2016-09-08', '2016-08-08'), '.'),'') AS h
             FROM (SELECT NULL)),h
        )))
      ) AS calendar
      WHERE DATEDIFF(calendar_day, dt) BETWEEN 0 AND 29 
    )
    GROUP BY calendar_day
    ORDER BY calendar_day DESC
    

    It assumes YourTable has userID and dt fields (like below for example)

    dt          userID   
    2016-09-08       1   
    2016-09-08       2   
    ...
    

    And you can control:
    - reporting dates range by changing respectively 2016-08-08 and 2016-09-08
    - aggregation size by changing 29 in BETWEEN 0 AND 29

    这篇关于BigQuery:是否可以在UDF内执行另一个查询?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆