通过24小时窗口获得每位唯一身份用户的重复数据删除转化 [英] Getting deduped conversions per unique users with a 24 hour window

查看:33
本文介绍了通过24小时窗口获得每位唯一身份用户的重复数据删除转化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要为每个唯一用户获得重复数据删除转换.此处的规则是,我需要一列,其中只获取一天之内进行的首次转化的计数.因此,我可以在2019年3月3日触发10次转换,但"Deduped"列将仅计入1的计数.该代码应可扩展用于TB的数据.

I need to get deduped conversions for each unique user. The rule here is that I need a column where I only get the count of the first conversion made within a day. So I can trigger 10 conversions for 3/03/2019, but the 'Deduped' column will only pull in the count for 1. The code should be scalable for TB of data.

这是我在BigQuery中的原始数据:

This is my original data in BigQuery:

Date    User_ID Total_Conversions
3/3/19  1234    1
3/3/19  1234    1
3/3/19  1234    1
3/3/19  12      1
3/3/19  12      1
3/4/19  1234    1
3/4/19  1234    1
3/5/19  1       1
3/6/19  1       0

我希望我的最终输出看起来像这样:

I want my final output to look like this:

Date     User_ID   Total_Conversions   Deduped
3/3/19   1234          3                   1
3/3/19   12            2                   1
3/5/19   1             1                   1
3/4/19   1234          2                   1
3/6/19   1             0                   0

推荐答案

我认为您只需要在此处查询基本的 GROUP BY :

I think you just need a basic GROUP BY query here:

SELECT
    date,
    User_ID,
    SUM(Total_Conversions) AS Total_Conversions,
    CASE WHEN SUM(Total_Conversions) > 0 THEN 1 ELSE 0 END AS Deduped
FROM yourTable
GROUP BY
    date,
    User_ID;

演示

(在MySQL中显示的演示只是出于说明目的)

(Demo shown in MySQL just for illustrative purposes)

假设从逻辑上讲,对于该组中的任何转化, Deduped 列始终为1,除非根本没有发生转化为零的转化.

This assumes that logically the Deduped column is always one, for any number of conversions in that group, unless no conversions at all happened, in which it becomes zero.

这篇关于通过24小时窗口获得每位唯一身份用户的重复数据删除转化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆