Hive QL - 限制每个项目的行数 [英] Hive QL - Limiting number of rows per each item
问题描述
如果我在一个where子句中列出了多个项目,那么如何将结果限制为N,以便列表中的每个项目?
EX:
select a_id,b,c,count(*),as sumrequests
from table_name
where
a_id in(1,2,3)
group by a_id,b,c
limit 10000
听起来像你的问题是得到每个a_id的前N个。您可以使用Hive 11中引入的窗口函数来执行此操作。例如:
SELECT a_id,b,c,count( *)as sumrequests
FROM(
SELECT a_id,b,c,row_number()over(Partition BY a_id)as row
FROM table_name
)rs
WHERE row (1,2,3)
GROUP BY a_id,b,c;
每个a_id会输出多达10,000个随机选择的行。如果您希望通过不仅仅是a_id进行分组,您可以进一步对其进行分区。您也可以在窗口函数中使用顺序,这里有很多示例来显示其他选项。
If I have multiple items listed in a where clause How would one go about limiting the results to N for each item in the list?
EX:
select a_id,b,c, count(*), as sumrequests
from table_name
where
a_id in (1,2,3)
group by a_id,b,c
limit 10000
Sounds like your question is to get the top N per a_id. You can do this with a window function, introduced in Hive 11. Something like:
SELECT a_id, b, c, count(*) as sumrequests
FROM (
SELECT a_id, b, c, row_number() over (Partition BY a_id) as row
FROM table_name
) rs
WHERE row <= 10000
AND a_id in (1, 2, 3)
GROUP BY a_id, b, c;
This will output up to 10,000 randomly-chosen rows per a_id. You can partition it further if you're looking to group by more than just a_id. You can also use order by in the window functions, there are a lot of examples out there to show additional options.
这篇关于Hive QL - 限制每个项目的行数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!