如何根据BigQuery中的另一个现有行为每个用户添加记录? [英] How to add records for each user based on another existing row in BigQuery?
问题描述
在这里发帖,以防万一有更多知识的人可能可以帮助我提供一些指导.
Posting here in case someone with more knowledge than may be able to help me with some direction.
我有一个这样的表:
| Row | date |user id | score |
-----------------------------------
| 1 | 20201120 | 1 | 26 |
-----------------------------------
| 2 | 20201121 | 1 | 14 |
-----------------------------------
| 3 | 20201125 | 1 | 0 |
-----------------------------------
| 4 | 20201114 | 2 | 32 |
-----------------------------------
| 5 | 20201116 | 2 | 0 |
-----------------------------------
| 6 | 20201120 | 2 | 23 |
-----------------------------------
但是,从这开始,我需要为每一位用户提供每天的记录,如果用户缺了一天,那么应该保留最后记录的分数,那么我将得到以下内容:
However, from this, I need to have a record for each user for each day where if a day is missing for a user, then the last score recorded should be maintained then I would have something like this:
| Row | date |user id | score |
-----------------------------------
| 1 | 20201120 | 1 | 26 |
-----------------------------------
| 2 | 20201121 | 1 | 14 |
-----------------------------------
| 3 | 20201122 | 1 | 14 |
-----------------------------------
| 4 | 20201123 | 1 | 14 |
-----------------------------------
| 5 | 20201124 | 1 | 14 |
-----------------------------------
| 6 | 20201125 | 1 | 0 |
-----------------------------------
| 7 | 20201114 | 2 | 32 |
-----------------------------------
| 8 | 20201115 | 2 | 32 |
-----------------------------------
| 9 | 20201116 | 2 | 0 |
-----------------------------------
| 10 | 20201117 | 2 | 0 |
-----------------------------------
| 11 | 20201118 | 2 | 0 |
-----------------------------------
| 12 | 20201119 | 2 | 0 |
-----------------------------------
| 13 | 20201120 | 2 | 23 |
-----------------------------------
我正在尝试使用StandardSQL在BigQuery中进行此操作.我对如何在接下来的空日期中保持相同的分数有一个想法,但是我真的不知道如何为每个用户添加缺少日期的新行.另外,请记住,此示例只有2个用户,但在我的数据中却有1500个以上.
I'm trying to to this in BigQuery using StandardSQL. I have an idea of how to keep the same score across following empty dates, but I really don't know how to add new rows for missing dates for each user. Also, just to keep in mind, this example only has 2 users, but in my data I have more than 1500.
我的最终目标是要显示每天平均得分的数据.对于背景,由于我们的逻辑,如果未在特定日期记录分数,则意味着用户仍处于记录的最后分数中,这就是为什么我每天需要为每个用户提供分数.
My end goal would be to show something like the average of the score per day. For background, because of our logic, if the score wasn't recorded in a specific day, this means that the user is still in the last score recorded which is why I need a score for every user every day.
我将非常感谢我能获得的任何帮助!我一直在尝试不同的选择,但没有成功
I'd really appreciate any help I could get! I've been trying different options without success
推荐答案
下面是BigQuery标准SQL
Below is for BigQuery Standard SQL
#standardSQL
select date, user_id,
last_value(score ignore nulls) over(partition by user_id order by date) as score
from (
select user_id, format_date('%Y%m%d', day) date,
from (
select user_id, min(parse_date('%Y%m%d', date)) min_date, max(parse_date('%Y%m%d', date)) max_date
from `project.dataset.table`
group by user_id
) a, unnest(generate_date_array(min_date, max_date)) day
)
left join `project.dataset.table` b
using(date, user_id)
-- order by user_id, date
如果应用于您问题中的样本数据-输出为
if applied to sample data from your question - output is
这篇关于如何根据BigQuery中的另一个现有行为每个用户添加记录?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!