如何根据BigQuery中的另一个现有行为每个用户添加记录? [英] How to add records for each user based on another existing row in BigQuery?

查看:111
本文介绍了如何根据BigQuery中的另一个现有行为每个用户添加记录?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在这里发帖,以防万一有更多知识的人可能可以帮助我提供一些指导.

Posting here in case someone with more knowledge than may be able to help me with some direction.

我有一个这样的表:

| Row |   date   |user id | score |
-----------------------------------
|  1  | 20201120 |  1     |   26  |
-----------------------------------
|  2  | 20201121 |  1     |   14  |
-----------------------------------
|  3  | 20201125 |  1     |   0   |
-----------------------------------
|  4  | 20201114 |  2     |   32  |
-----------------------------------
|  5  | 20201116 |  2     |   0   |
-----------------------------------
|  6  | 20201120 |  2     |   23  |
-----------------------------------

但是,从这开始,我需要为每一位用户提供每天的记录,如果用户缺了一天,那么应该保留最后记录的分数,那么我将得到以下内容:

However, from this, I need to have a record for each user for each day where if a day is missing for a user, then the last score recorded should be maintained then I would have something like this:

| Row |   date   |user id | score |
-----------------------------------
|  1  | 20201120 |  1     |   26  |
-----------------------------------
|  2  | 20201121 |  1     |   14  |
-----------------------------------
|  3  | 20201122 |  1     |   14  |
-----------------------------------
|  4  | 20201123 |  1     |   14  |
-----------------------------------
|  5  | 20201124 |  1     |   14  |
-----------------------------------
|  6  | 20201125 |  1     |   0   |
-----------------------------------
|  7  | 20201114 |  2     |   32  |
-----------------------------------
|  8  | 20201115 |  2     |   32  |
-----------------------------------
|  9  | 20201116 |  2     |   0   |
-----------------------------------
|  10 | 20201117 |  2     |   0   |
-----------------------------------
|  11 | 20201118 |  2     |   0   |
-----------------------------------
|  12 | 20201119 |  2     |   0   |
-----------------------------------
|  13 | 20201120 |  2     |   23  |
-----------------------------------

我正在尝试使用StandardSQL在BigQuery中进行此操作.我对如何在接下来的空日期中保持相同的分数有一个想法,但是我真的不知道如何为每个用户添加缺少日期的新行.另外,请记住,此示例只有2个用户,但在我的数据中却有1500个以上.

I'm trying to to this in BigQuery using StandardSQL. I have an idea of how to keep the same score across following empty dates, but I really don't know how to add new rows for missing dates for each user. Also, just to keep in mind, this example only has 2 users, but in my data I have more than 1500.

我的最终目标是要显示每天平均得分的数据.对于背景,由于我们的逻辑,如果未在特定日期记录分数,则意味着用户仍处于记录的最后分数中,这就是为什么我每天需要为每个用户提供分数.

My end goal would be to show something like the average of the score per day. For background, because of our logic, if the score wasn't recorded in a specific day, this means that the user is still in the last score recorded which is why I need a score for every user every day.

我将非常感谢我能获得的任何帮助!我一直在尝试不同的选择,但没有成功

I'd really appreciate any help I could get! I've been trying different options without success

推荐答案

下面是BigQuery标准SQL

Below is for BigQuery Standard SQL

#standardSQL
select date, user_id, 
  last_value(score ignore nulls) over(partition by user_id order by date) as score
from (
  select user_id, format_date('%Y%m%d', day) date,  
  from (
    select user_id, min(parse_date('%Y%m%d', date)) min_date, max(parse_date('%Y%m%d', date)) max_date
    from `project.dataset.table` 
    group by user_id
  ) a, unnest(generate_date_array(min_date, max_date)) day
)
left join `project.dataset.table` b
using(date, user_id)
-- order by user_id, date     

如果应用于您问题中的样本数据-输出为

if applied to sample data from your question - output is

这篇关于如何根据BigQuery中的另一个现有行为每个用户添加记录?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆