Impala GROUP BY分区列 [英] Impala GROUP BY partitioned column

查看:330
本文介绍了Impala GROUP BY分区列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

理论问题

让我们说我的表有四列:A,B,C,D. A和D的值相等,表按A列划分.

Lets say I have table with four columns : A,B,C,D. Values of A and D are equal, table is partitioned by column A.

明智的性能,如果我发出此查询,会有所不同吗? 按A选择SUM(B)GROUP; 或这一个: SELECT SUM(B)GROUP BY D;

Performance wise, would it make any difference if I issue this query SELECT SUM(B) GROUP BY A ; or this one : SELECT SUM(B) GROUP BY D ;

我要问的是,通过在分区列上使用GROUP BY可以提高性能吗?

In different words I'm asking, is there any performance gain by using the GROUP BY on partitioned column ?

谢谢

推荐答案

如果在过滤器(SQL中的WHERE子句)中使用分区列,通常可以提高性能

Usually there are performance gains if you use the partitioned columns on a filter (WHERE clause in your SQL)

由于两个查询都使用全表扫描",因此两个查询之间应该没有太大差异.如果存在很多分区(例如大约50K),您可能会发现差异,这往往会降低查询性能,但通常情况并非如此.

since both queries use a "full table scan" it should not have a lot of difference between both queries. You might see a difference if theres is a lot of partitions (Like around 50K), with tends to degrade the query performance, but that is not usually the case.

这篇关于Impala GROUP BY分区列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆