AWS Glue DynamicFrames和下推谓词 [英] AWS Glue DynamicFrames and Push Down Predicate

查看:184
本文介绍了AWS Glue DynamicFrames和下推谓词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为AWS Glue编写一个ETL脚本,该脚本源于S3存储的json文件,在其中创建一个DynamicFrame并尝试使用pushDownPredicate逻辑来限制输入的数据:

I am writing an ETL script for AWS Glue that is sourced in S3 stored json files, in which I am creating a DynamicFrame and attempting to use pushDownPredicate logic to restrict the data coming in:

# Define the data restrictor predicate
now = str(int(round(time.time() * 1000)))
now_minus_7_date = datetime.datetime.now() - datetime.timedelta(days=7)
now_minus_7 =  str(int(time.mktime(now_minus_7_date.timetuple()) * 1000))

last_7_predicate = "\"timestamp BETWEEN '" + now_minus_7 + "' AND '" + now + "'\""
print("Your predicate will be :" + last_7_predicate)

表结构是具有分区(所有字符串)RegionalCenter,Year,Month,Day和Timestamp的多个列.我收到的错误消息是:

The table structure is multiple columns with the partitions (all strings) RegionalCenter, Year, Month, Day, and Timestamp. The error message I am receiving is:

调用o70.getDynamicFrame时发生错误.用户的下推谓词:'1550254844000'和'1550859644703'之间的时间戳"无法针对分区列进行解析:[regionalcenter,hour,year,timestamp,month,day]

An error occurred while calling o70.getDynamicFrame. User's pushdown predicate: "timestamp BETWEEN '1550254844000' AND '1550859644703'" can not be resolved against partition columns: [regionalcenter,hour,year,timestamp,month,day]

我是AWS Glue和Spark的新手,因此,为什么不能针对实际上包含时间戳的分区列解析谓词时间戳感到非常困惑.我确保表中使用的时间戳以毫秒为单位.来自我们的S3结构的一个示例是:

I am new to AWS Glue and Spark, and with that said, am very perplexed as to why the predicate timestamp cannot be resolved against partition columns that do in fact contain timestamp. I have ensured that the timestamps used in the table are in milliseconds. An example from our S3 structure would be:

regionalcenter =密苏里州/年= 2019/月= 2/天= 11/小时= 22/时间戳= 1549924089246

regionalcenter=Missouri/Year=2019/Month=2/Day=11/Hour=22/Timestamp=1549924089246

DynamicFrame代码如下:

The DynamicFrame code is as follows:

    # Read data from table
dynamic_frame = glueContext.create_dynamic_frame.from_catalog(
    database = args['DatabaseName'],
    table_name = args['TableName'],
    transformation_ctx = 'dynamic_frame',
    push_down_predicate = last_7_predicate)

请在此告诉我还有什么对您有所帮助.对此我并不陌生,我不确定是否还有其他价值.谢谢

Please let me know what else might be helpful for you here. Being new to this I am not entirely certain what else would be of value. Thank you

推荐答案

啊,我引号太多了.考虑一下已解决的问题:

Ah, I was including too many quotes. Consider this one resolved:

last_7_predicate = "timestamp between '" + now_minus_7 + "' AND '" + now + "'"

这篇关于AWS Glue DynamicFrames和下推谓词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆