从表中查找排名前1000的条目以及计数和排名 [英] Find Top 1000 entries along with count and rank from table
问题描述
我在Redshift中有一个表,其中包含约300亿行,结构如下,
I have a table with around 30 billions rows in Redshift with following structure,
userid itemid country start_date
uid1 itemid1 country1 2018-07-25 00:00:00
uid2 itemid2 country1 2018-07-25 00:00:00
uid3 itemid1 country2 2018-07-25 00:00:00
uid4 itemid3 country1 2018-07-25 00:00:00
uid5 itemid1 country1 2018-07-25 00:00:00
uid1 itemid2 country2 2018-07-25 00:00:00
uid2 itemid2 country2 2018-07-25 00:00:00
在这里,我想查找有多少不重复用户购买的商品,然后为每个国家/地区和start_date选择最畅销的1000个商品.在这里,商品的等级和销售次数都是必需的.
Here, I want to find item's are bought by how many unique users and then pick top 1000 most sold item for each country and start_date. Here, both rank and number of times item sold is required.
预期会有以下输出
itemid country sold_count start_date
itemid1 country1 2 2018-07-25 00:00:00
itemid2 country2 2 2018-07-25 00:00:00
itemid1 country2 1 2018-07-25 00:00:00
itemid2 country1 1 2018-07-25 00:00:00
itemid3 country1 1 2018-07-25 00:00:00
我正在尝试实现等级函数,但没有得到预期的结果.
I am trying to implement rank function but I am not getting expected result.
我正在尝试跟踪查询,
select itemid, start_date, Rank() over (partition by itemid order by
count(distinct(userid)) desc) as rank1
from table_name
group by item_id, start_date
order by rank1 desc;
此外,我想有一个列,用于按国家和起始日期对不合法的用户ID购买的item_id组进行计数.在上面的查询中,我忽略了country列以简化查询.
Also, I want to have a column for count of unqiue userid bought item_id group by country and start_date. In the above query, I have ignored country column to simplify the query.
请帮助我.
推荐答案
如果我认为版本"表示国家",那么我认为你想要:
If I assume that "version" means "country", then I think you want:
select *
from (select itemid, country, start_date, count(distinct userid) as num_users,
row_number() over (partition by country, start_date
order by count(distinct userid) desc
) as seqnum
from table_name
group by item_id, country, start_date
) x
where seqnum <= 1000
这篇关于从表中查找排名前1000的条目以及计数和排名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!