如何在AWS Glue中指定联接类型? [英] How to specify join types in AWS Glue?

查看:160
本文介绍了如何在AWS Glue中指定联接类型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用AWS Glue联接两个表.默认情况下,它执行INNER JOIN.我想做一个左外连接.我参考了AWS Glue文档,但是无法将联接类型传递给Join.apply()方法.是否可以在AWS Glue中实现这一目标?

I am using AWS Glue to join two tables. By default, it performs INNER JOIN. I want to do a LEFT OUTER JOIN. I referred the AWS Glue documentation but there is no way to pass the join type to the Join.apply() method. Is there a way to achieve this in AWS Glue?

## @type: Join
## @args: [keys1 = id, keys2 = "user_id"]
## @return: cUser
## @inputs: [frame1 = cUser0, frame2 = cUserLogins]
#cUser = Join.apply(frame1 = cUser0, frame2 = +, keys1 = "id", keys2 = "user_id", transformation_ctx = "<transformation_ctx>")


## @type: Join
## @args: [keys1 = id, keys2 = user_id]
## @return: datasource0
## @inputs: [frame1 = cUser, frame2 = cKKR]
datasource0 = Join.apply(frame1 = cUser0, frame2 = cKKR, keys1 = "id", keys2 = "user_id", transformation_ctx = "<transformation_ctx>")

## @type: Join
## @args: [keys1 = branch_id, keys2 = user_id]
## @return: datasource1
## @inputs: [frame1 = datasource0, frame2 = cBranch]
datasource1 = Join.apply(frame1 = datasource0, frame2 = cBranch, keys1 = "branch_id", keys2 = "user_id", transformation_ctx = "<transformation_ctx>")

推荐答案

当前,AWS Glue不支持LEFT和RIGHT连接.但是,我们仍然可以通过将DynamicFrame转换为DataFrame并使用join方法来实现.

Currently, LEFT and RIGHT joins are not supported by AWS Glue. But, we can still achieve it by converting the DynamicFrame to the DataFrame and using join method.

这里是示例:

cUser0 = glueContext.create_dynamic_frame.from_catalog(database = "captains", table_name = "cp_txn_winds_karyakarta_users", transformation_ctx = "cUser")

cUser0DF = cUser0.toDF()

cKKR = glueContext.create_dynamic_frame.from_catalog(database = "captains", table_name = "cp_txn_winds_karyakarta_karyakartas", redshift_tmp_dir = args["TempDir"], transformation_ctx = "cKKR")

cKKRDF = cKKR.toDF()

dataSource0 = cUser0DF.join(cKKRDF, cUser0DF.id == cKKRDF.user_id,how='left_outer')

这篇关于如何在AWS Glue中指定联接类型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆