Impala GROUP BY分区列 [英] Impala GROUP BY partitioned column
问题描述
理论问题
让我们说我的表有四列:A,B,C,D. A和D的值相等,表按A列划分.
Lets say I have table with four columns : A,B,C,D. Values of A and D are equal, table is partitioned by column A.
明智的性能,如果我发出此查询,会有所不同吗? 按A选择SUM(B)GROUP; 或这一个: SELECT SUM(B)GROUP BY D;
Performance wise, would it make any difference if I issue this query SELECT SUM(B) GROUP BY A ; or this one : SELECT SUM(B) GROUP BY D ;
我要问的是,通过在分区列上使用GROUP BY可以提高性能吗?
In different words I'm asking, is there any performance gain by using the GROUP BY on partitioned column ?
谢谢
推荐答案
如果在过滤器(SQL中的WHERE子句)中使用分区列,通常可以提高性能
Usually there are performance gains if you use the partitioned columns on a filter (WHERE clause in your SQL)
由于两个查询都使用全表扫描",因此两个查询之间应该没有太大差异.如果存在很多分区(例如大约50K),您可能会发现差异,这往往会降低查询性能,但通常情况并非如此.
since both queries use a "full table scan" it should not have a lot of difference between both queries. You might see a difference if theres is a lot of partitions (Like around 50K), with tends to degrade the query performance, but that is not usually the case.
这篇关于Impala GROUP BY分区列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!