在BigQuery中按重复日期按表的最近日期进行联接 [英] Join by nearest date for the table with duplicate records in BigQuery
本文介绍了在BigQuery中按重复日期按表的最近日期进行联接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有 installs
表,其安装具有相同的 user_id
,但具有不同的 install_date
.我想在 install_date
之前将所有收入记录与最近的安装记录合并,该记录要比 revenue_date
少,因为我需要将其作为 source
字段值进行下一步处理.这意味着输出行数应等于收入表记录.如何在BigQuery中实现?
I have installs
table with installs that have the same user_id
but different install_date
.
I want to get all revenue records joined with nearest install record by install_date
that is less then revenue_date
because I need it's source
field value for next processing.
That means that output rows count should be equal to revenue table records.
How can it be achieved in BigQuery?
以下是数据:
installs
install_date user_id source
--------------------------------
2020-01-10 user_a source_I
2020-01-15 user_a source_II
2020-01-20 user_a source_III
***info about another users***
revenue
revenue_date user_id revenue
--------------------------------------------
2020-01-11 user_a 10
2020-01-21 user_a 20
***info about another users***
推荐答案
请考虑以下解决方案
select any_value(r).*,
array_agg(
(select as struct i.* except(user_id))
order by install_date desc
limit 1
)[offset(0)].*
from `project.dataset.revenue` r
join `project.dataset.installs` i
on i.user_id = r.user_id
and install_date < revenue_date
group by format('%t', r)
如果应用于问题中的样本数据-输出为
If applied to sample data in your question - output is
这篇关于在BigQuery中按重复日期按表的最近日期进行联接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文