如何使用SQL计算每个月的保留期 [英] How to calculate retention month over month using SQL
问题描述
试图获得一个基本表,该表显示从一个月到下个月的保留期。因此,如果某人上个月购买了某商品,而下个月又购买了它。
月,num_transactions,repeat_transactions,保留
2012-02、5、2、40%
2012-03、10、3、30%
2012-04、15、8、53%
因此,如果上个月购买的每个人都在下个月再次购买,则您拥有100%。
<到目前为止,我只能手动计算东西。这给了我两个月都看到过的行:
选择count(*)作为num_repeat_buyers from
(选择不同的
to_char(transaction.timestamp,'YYYY-MM')作为月份,
auth_user.email
来自
auth_user,
交易
其中
auth_user.id = transaction.buyer_id和
to_char(transaction.timestamp,'YYYY-MM')='2012-03'
)如表1所示,
(选择不同的
to_char(transaction.timestamp,'YYYY-MM')作为月份,
auth_user.email
来自
auth_user,
交易
,其中
auth_user.id = transaction.buyer_id和
to_char(transaction.timestamp,'YYYY-MM')='2012-04'
)如表2
其中table1.email = table2.email
这不对,但我觉得我可以使用Postgres的某些窗口功能。请注意,开窗功能不允许您指定WHERE子句。您通常可以访问前几行和前几行:
选择月份,count(*)为num_transactions,count(* )超过(按月分部或按月分部)
从
(选择不同的
to_char(transaction.timestamp,'YYYY-MM')作为月份,
auth_user.email $ b来自
auth_user的$ b,
交易
,其中
auth_user.id = transaction.buyer_id
订单被
月份
)作为transaction_by_month
组,每月
美元
给出以下测试表(您应该提供):
CREATE TEMP TABLE事务(buyer_id int,tstamp时间戳);
插入交易值
(1,'2012-01-03 20:00')
,(1,'2012-01-05 20:00')
, (1,'2012-01-07 20:00')-本月多次交易
,(1,'2012-02-03 20:00')-下个月
,(1 ,'2012-03-05 20:00')-下个月
,(2,'2012-01-07 20:00')
,(2,'2012-03-07 20 :00')-下个月
,(3,'2012-01-07 20:00')-仅一次
,(4,'2012-02-07 20:00' ); -仅一次
表 auth_user
不是与问题有关。
使用 tstamp
作为列名,因为我不使用基本类型作为标识符。
我将使用窗口函数 lag()
来标识重复购买者。为了简短起见,我将聚合和窗口函数合并到一个查询级别。请记住,窗口函数是在聚集函数之后 应用的。
有t AS(
SELECT Buyer_id
,date_trunc('month',tstamp)AS月
,count(*)AS item_transactions
,lag(date_trunc('month',tstamp))OVER(按PARTY_ID划分)
ORDER BY date_trunc('month',tstamp))
= date_trunc('month',tstamp)-间隔'1 month'
或NULL作为repeat_transaction
FROM transaction
WHERE tstamp> ='2012-01-01':: date
并且tstamp<'2012-05-01':: date-感兴趣的时间范围。
GROUP BY 1, 2
)
选择月份
,sum(item_transactions)AS num_trans
,count(*)AS num_buyers
,count(repeat_transaction)AS repeat_buyers
, round(
在总和(item_transactions)> 0的情况下
THEN count(repeat_transaction)/总和(item_transacti ons)* 100
ELSE 0
END,2)AS买方保留
从t
组1开始
命令1;
结果:
月| num_trans | num_buyers | repeat_buyers | Buyer_retention_pct
--------- + ----------- + ------------ + ---------- ----- + --------------------
2012-01 | 5 | 3 | 0 | 0.00
2012-02 | 2 | 2 | 1 | 50.00
2012-03 | 2 | 2 | 1 | 50.00
我扩展了您的问题,以提供交易数量和购买者数量之间的差额。
repeat_transaction
的或NULL
用于转换 FALSE
到 NULL
,因此这些值不会被 count()$ c计数下一步。
Trying to get a basic table that shows retention from one month to the next. So if someone buys something last month and they do so the next month it gets counted.
month, num_transactions, repeat_transactions, retention
2012-02, 5, 2, 40%
2012-03, 10, 3, 30%
2012-04, 15, 8, 53%
So if everyone that bought last month bought again the following month you have 100%.
So far I can only calculate stuff manually. This gives me the rows that have been seen in both months:
select count(*) as num_repeat_buyers from
(select distinct
to_char(transaction.timestamp, 'YYYY-MM') as month,
auth_user.email
from
auth_user,
transaction
where
auth_user.id = transaction.buyer_id and
to_char(transaction.timestamp, 'YYYY-MM') = '2012-03'
) as table1,
(select distinct
to_char(transaction.timestamp, 'YYYY-MM') as month,
auth_user.email
from
auth_user,
transaction
where
auth_user.id = transaction.buyer_id and
to_char(transaction.timestamp, 'YYYY-MM') = '2012-04'
) as table2
where table1.email = table2.email
This is not right but I feel like I can use some of Postgres' windowing functions. Keep in mind the windowing functions don't let you specify WHERE clauses. You mostly have access to the previous rows and the preceding rows:
select month, count(*) as num_transactions, count(*) over (PARTITION BY month ORDER BY month)
from
(select distinct
to_char(transaction.timestamp, 'YYYY-MM') as month,
auth_user.email
from
auth_user,
transaction
where
auth_user.id = transaction.buyer_id
order by
month
) as transactions_by_month
group by
month
Given the following test table (which you should have provided):
CREATE TEMP TABLE transaction (buyer_id int, tstamp timestamp);
INSERT INTO transaction VALUES
(1,'2012-01-03 20:00')
,(1,'2012-01-05 20:00')
,(1,'2012-01-07 20:00') -- multiple transactions this month
,(1,'2012-02-03 20:00') -- next month
,(1,'2012-03-05 20:00') -- next month
,(2,'2012-01-07 20:00')
,(2,'2012-03-07 20:00') -- not next month
,(3,'2012-01-07 20:00') -- just once
,(4,'2012-02-07 20:00'); -- just once
Table auth_user
is not relevant to the problem.
Using tstamp
as column name since I don't use base types as identifiers.
I am going to use the window function lag()
to identify repeated buyers. To keep it short I combine aggregate and window functions in one query level. Bear in mind that window functions are applied after aggregate functions.
WITH t AS (
SELECT buyer_id
,date_trunc('month', tstamp) AS month
,count(*) AS item_transactions
,lag(date_trunc('month', tstamp)) OVER (PARTITION BY buyer_id
ORDER BY date_trunc('month', tstamp))
= date_trunc('month', tstamp) - interval '1 month'
OR NULL AS repeat_transaction
FROM transaction
WHERE tstamp >= '2012-01-01'::date
AND tstamp < '2012-05-01'::date -- time range of interest.
GROUP BY 1, 2
)
SELECT month
,sum(item_transactions) AS num_trans
,count(*) AS num_buyers
,count(repeat_transaction) AS repeat_buyers
,round(
CASE WHEN sum(item_transactions) > 0
THEN count(repeat_transaction) / sum(item_transactions) * 100
ELSE 0
END, 2) AS buyer_retention
FROM t
GROUP BY 1
ORDER BY 1;
Result:
month | num_trans | num_buyers | repeat_buyers | buyer_retention_pct
---------+-----------+------------+---------------+--------------------
2012-01 | 5 | 3 | 0 | 0.00
2012-02 | 2 | 2 | 1 | 50.00
2012-03 | 2 | 2 | 1 | 50.00
I extended your question to provide for the difference between the number of transactions and the number of buyers.
The OR NULL
for repeat_transaction
serves to convert FALSE
to NULL
, so those values do not get counted by count()
in the next step.
这篇关于如何使用SQL计算每个月的保留期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!