“谓词下推"和“谓词下推"之间的区别是什么?和“投影下推"? [英] What is the difference between "predicate pushdown" and "projection pushdown"?

查看:119
本文介绍了“谓词下推"和“谓词下推"之间的区别是什么?和“投影下推"?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到过多种信息来源,例如这里,将谓词下推"解释为:

I have come across several sources of information, such as the one found here, which explain "predicate pushdown" as :

...如果您可以将查询的一部分下推"到存储数据的位置,从而过滤掉大部分数据,则可以大大减少网络流量.

… if you can "push down" parts of the query to where the data is stored, and thus filter out most of the data, then you can greatly reduce network traffic.

但是,我在其他文档中也看到过投影下推"一词,例如这里,这似乎是一回事,但据我所知,我不确定.

However, I have also seen the term "projection pushdown" in other documentation such as here, which appears to be the same thing but I am not sure in my understanding.

两个词之间有特定区别吗?

Is there a specific difference between the two terms?

推荐答案

谓词指影响返回的行数的where/filter子句.

Predicate refers to the where/filter clause which effects the amount of rows returned.

投影指的是选定的列.

例如:

如果过滤器仅通过5%的行,则只有5%的表将从存储传递到Spark,而不是整个表.

If your filters pass only 5% of the rows, only 5% of the table will be passed from the storage to Spark instead of the full table.

如果您的投影只选择了10列中的3列,则更少的列将从存储传递到Spark,并且如果您的存储是柱状的(例如Parquet,不是Avro)并且未选择的列也不是过滤器的一部分,那么甚至不必阅读这些列.

If your projection selects only 3 columns out of 10, then less columns will be passed from the storage to Spark and if your storage is columnar (e.g. Parquet, not Avro) and the non selected columns are not a part of the filter, then these columns won't even have to be read.

这篇关于“谓词下推"和“谓词下推"之间的区别是什么?和“投影下推"?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆