如何在Calcite中将项目,过滤,聚合下推到TableScan [英] How to push down project, filter, aggregation to TableScan in Calcite

查看:486
本文介绍了如何在Calcite中将项目,过滤,聚合下推到TableScan的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Apache Calcite来实现分布式OLAP系统,该数据源是RDBMS.所以我想将RelNode树中的项目/过滤器/聚合推到MyTableScan extends TableScan.在MyTableScan中,是RelBuilder以获取推送的RelNode.最后,RelBuilder生成对源数据库的查询.同时,应移动或修改原始RelNode树中的项目/过滤器/聚合.

I am using Apache Calcite to implement a distributed OLAP system, which datasource is RDBMS. So I want to push down the project/filter/aggregation in RelNode tree to MyTableScan extends TableScan. In MyTableScan, a RelBuilder to get the pushed RelNode. At last, RelBuilder to generate the Query to the source database. At the same time, the project/filter/aggregation in original RelNode tree should be moved or modified.

我知道,方解石不支持此功能.

As I known, Calcite does not support this feature.

当前局限性:JDBC适配器当前仅下推表扫描操作;它仅可用于表扫描操作.所有其他处理(过滤,联接,聚合等)都在方解石内部进行.我们的目标是尽可能减少对源系统的处理,尽可能地翻译语法,数据类型和内置函数.如果Calcite查询基于单个JDBC数据库中的表,则原则上整个查询应转到该数据库.如果表来自多个JDBC来源,或者是JDBC和非JDBC的混合使用,则Calcite将使用它可以使用的最高效的分布式查询方法.

Current limitations: The JDBC adapter currently only pushes down table scan operations; all other processing (filtering, joins, aggregations and so forth) occurs within Calcite. Our goal is to push down as much processing as possible to the source system, translating syntax, data types and built-in functions as we go. If a Calcite query is based on tables from a single JDBC database, in principle the whole query should go to that database. If tables are from multiple JDBC sources, or a mixture of JDBC and non-JDBC, Calcite will use the most efficient distributed query approach that it can.

在我看来,RelOptRule可能是一个不错的选择.不幸的是,当我创建新的RelOptRule时,我不容易找到要删除的父节点.

In my opinion, RelOptRule may be a good choice. Unfortunately, when I create new RelOptRule, I can not easily find the parent node to remove a node.

RelOptRule是一个不错的选择?任何人都有实现此功能的好主意吗?

RelOptRule is a good choice? Anyone has a good idea to implement this feature?

谢谢.

推荐答案

创建新的RelOptRule是方法.请注意,您不应该尝试直接删除规则内的任何节点.相反,您匹配包含要替换的节点的子树(例如,在TableScan顶部的Filter).然后,将整个子树替换为等效节点,从而将过滤器下推.

Creating a new RelOptRule is the way to go. Note that you shouldn't be trying directly remove any nodes inside a rule. Instead, you match a subtree that contains the nodes you want to replace (for example, a Filter on top of a TableScan). And then replace that entire subtree with an equivalent node which pushes down the filter.

通常通过创建符合特定适配器的调用约定的相关操作的子类来处理.例如,在Cassandra适配器中,有一个CassandraFilterRule,它与LogicalFilter匹配,位于CassandraTableScan的顶部.然后,convert函数构造一个CassandraFilter实例. CassandraFilter实例设置必要的信息,以便在实际发出查询时可以使用过滤器.

This is normally handled by creating a subclass of the relevant operation which conforms to the calling convention of the particular adapter. For example, in the Cassandra adapter, there is a CassandraFilterRule which matches a LogicalFilter on top of a CassandraTableScan. The convert function then constructs a CassandraFilter instance. The CassandraFilter instance sets up the necessary information so that when the query is actually issued, the filter is available.

浏览一些Cassandra,MongoDB或Elasticsearch适配器的代码可能会有所帮助,因为它们比较简单.我还建议将其添加到邮件列表中,因为您在那里可能会得到更详细的建议.

Browsing some of the code for the Cassandra, MongoDB, or Elasticsearch adapters may be helpful as they are on the simpler side. I would also suggest bringing this to the mailing list as you'll probably get more detailed advice there.

这篇关于如何在Calcite中将项目,过滤,聚合下推到TableScan的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆