从AWS Glue升级到Amazon Redshift [英] Upsert from AWS Glue to Amazon Redshift

查看:106
本文介绍了从AWS Glue升级到Amazon Redshift的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道没有直接的UPSERT查询可以直接从Glue执行到Redshift.是否可以在胶水脚本本身中实现登台表概念?

I understand that there is no direct UPSERT query one can perform directly from Glue to Redshift. Is it possible to implement the staging table concept within the glue script itself?

所以我的期望是创建登台表,将其与目标表合并,最后删除它.可以在Glue脚本中实现吗?

So my expectation is creating the staging table, merging it with destination table and finally deleting it. Can it be achieved within the Glue script?

推荐答案

可以通过将'postactions'选项传递给JDBC接收器,使用Glue中的登台表将Redsert实现到Redshift:

It is possible to implement upsert into Redshift using staging table in Glue by passing 'postactions' option to JDBC sink:

val destinationTable = "upsert_test"
val destination = s"dev_sandbox.${destinationTable}"
val staging = s"dev_sandbox.${destinationTable}_staging"

val fields = datasetDf.toDF().columns.mkString(",")

val postActions =
  s"""
     DELETE FROM $destination USING $staging AS S
        WHERE $destinationTable.id = S.id
          AND $destinationTable.date = S.date;
     INSERT INTO $destination ($fields) SELECT $fields FROM $staging;
     DROP TABLE IF EXISTS $staging
  """

// Write data to staging table in Redshift
glueContext.getJDBCSink(
  catalogConnection = "redshift-glue-connections-test",
  options = JsonOptions(Map(
    "database" -> "conndb",
    "dbtable" -> staging,
    "overwrite" -> "true",
    "postactions" -> postActions
  )),
  redshiftTmpDir = s"$tempDir/redshift",
  transformationContext = "redshift-output"
).writeDynamicFrame(datasetDf)

确保用于写入Redshift的用户具有足够的权限来在登台模式中创建/删除表.

Make sure the user used for writing to Redshift has sufficient permissions to create/drop tables in the staging schema.

这篇关于从AWS Glue升级到Amazon Redshift的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆