AWS Glue谓词下推条件无效 [英] AWS Glue predicate push down condition has no effect

查看:124
本文介绍了AWS Glue谓词下推条件无效的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个MySQL源,我将通过该源创建具有谓词下推条件的Glue动态框架,如下所示:

I have a MySQL source from which I am creating a Glue Dynamic Frame with predicate push down condition as follows

datasource = glueContext.create_dynamic_frame_from_catalog(
    database = source_catalog_db, 
    table_name = source_catalog_tbl, 
    push_down_predicate = "id > 1531812324", 
    transformation_ctx = "datasource")

无论我在'push_down_predicate'中设置什么条件,我总是在'datasource'中获取所有记录. 我想念什么?

I am always getting all the records in 'datasource' whatever the condition I put in 'push_down_predicate'. What am I missing?

推荐答案

下推谓词仅适用于分区列.换句话说,您的数据文件应放在分层结构的文件夹中.例如,如果数据位于s3://bucket/dataset/中并按年,月和日划分,则结构应如下:

Pushdown predicate works for partitioning columns only. In other words, your data files should be placed in hierarchically structured folders. For example, if data is located in s3://bucket/dataset/ and partitioned by year, month and day then the structure should be following:

s3://bucket/dataset/year=2018/month=7/day=18/<data-files-here>

在这种情况下,下推谓词仅适用于列yearmonthday:

In such case pushdown predicate would work for columns year, month and day only:

datasource = glueContext.create_dynamic_frame_from_catalog(
    database = source_catalog_db, 
    table_name = source_catalog_tbl, 
    push_down_predicate = "year = 2017 and month > 6 and day between 3 and 10", 
    transformation_ctx = "datasource")

除了要记住,下推谓词仅适用于s3数据源.

Besides that you have to keep in mind that pushdown predicates work with s3 data sources only.

这是一个不错的博客由AWS Glue开发人员撰写的关于数据分区的文章.

这篇关于AWS Glue谓词下推条件无效的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆