为什么在调用AWS Glue书签的转换和接收器操作时需要设置`transformation_ctx`参数? [英] Why do I need to set the `transformation_ctx` parameter when calling transformation and sink operations for AWS Glue bookmark to work?

查看:237
本文介绍了为什么在调用AWS Glue书签的转换和接收器操作时需要设置`transformation_ctx`参数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

AWS Glue书签文档( https://docs .aws.amazon.com/glue/latest/dg/monitor-continuations.html )似乎建议人们必须将transformation_ctx参数传递到书签的源,转换和接收操作.这反映在该页面的示例代码中,其中所有create_dynamic_frame.from_catalog()ApplyMapping.apply()write_dynamic_frame.from_options()的调用都以transformation_ctx值传递.

The AWS Glue Bookmark document (https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html) seems to suggest one has to pass a transformation_ctx parameter to source, transform and sink operation for the bookmark to work. This is reflected in the sample code in that page, where invocation of all of create_dynamic_frame.from_catalog(), ApplyMapping.apply() and write_dynamic_frame.from_options() are passed with a transformation_ctx value.

我理解将这样的transformation_ctx传递给create_dynamic_frame.from_catalog()方法的要点,因为AWS Glue需要在给定的transformation_ctx键下存储有关已在书签中读取的文件的信息.

I can understand the point to pass such a transformation_ctx to create_dynamic_frame.from_catalog() method, as AWS Glue needs to store the information about files which have been read in the bookmark under the given transformation_ctx key.

但是,我不明白为什么对于ApplyMapping.apply()write_dynamic_frame.from_options()之类的方法也需要这样做.换句话说,这些操作需要存储在书签中的状态信息是什么?如果我不将transformation_ctx传递给这些方法,这会导致什么问题?

However, I don't understand why this is also necessary for methods like ApplyMapping.apply() and write_dynamic_frame.from_options(). To put it another way, what is the state information these operations need to store in the bookmark? If I don't pass transformation_ctx to these methods, what problems will this cause?

推荐答案

几个月前(2019年10月),我对书签问题也有同样的疑问,由于Amazon提供的文档不是很清楚,因此我打开了一个支持案例以了解更多信息.它是如何实现的.

I had the same doubts about the bookmarking months ago (October 2019) and since the documentation provided by Amazon is not very clear I opened a support case to understand more how it is implemented.

在我的胶水工作中,有:

In my Glue Job there was:

  • 来自S3的读取函数(glue_context.create_dynamic_frame.from_options)
  • 一个ResolveChoice.apply
  • Redshift的写入功能(glue_context.write_dynamic_frame.from_jdbc_conf)

所有这些操作都具有transformation_ctx值,我测试了不同的可能行为(所有相同的transformation_ctx,不同的固定值,动态值ecc).

All of these operations has the transformation_ctx value, I tested different possible behaviours (same transformation_ctx for all, different, fixed values, dynamic values ecc).

在获得AWS支持的许多消息后,他们确认书签仅在读取功能上起作用(他们还说仅以S3作为源,但我没有对其进行测试),因此我问是否Transformation_ctx在该功能中无用. ResolveChoice(还有write函数)他们说是!他们证实这没有任何区别.

After many message with the AWS support they confirm that the bookmarking works only on the read function (They also said with only S3 as a source but I didn't test it), so I ask if the transformation_ctx is useless in the ResolveChoice (and write function too) and they said YES! They confirmed that doesn't make any difference.

对于写函数,它什么都不会改变,因此没有书签逻辑,也没有避免函数".如果以前已经运行过.

Futhermore for the write function it doesn't change anything, so there is no bookmark logic, no "avoid function" if it has been already run before.

这篇关于为什么在调用AWS Glue书签的转换和接收器操作时需要设置`transformation_ctx`参数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆