使用Scala提取嵌入式AWS Glue连接凭证 [英] Extract Embedded AWS Glue Connection Credentials Using Scala
问题描述
我有一个胶粘作业,可以直接从redshift读取数据,为此,必须提供连接凭据.我创建了一个嵌入式胶粘连接,可以使用以下 pyspark 代码提取凭据.有没有办法在 Scala 中做到这一点?
I have a glue job that reads directly from redshift, and to do that, one has to provide connection credentials. I have created an embedded glue connection and can extract the credentials with the following pyspark code. Is there a way to do this in Scala?
glue = boto3.client('glue', region_name='us-east-1')
response = glue.get_connection(
Name='name-of-embedded-connection',
HidePassword=False
)
table = spark.read.format(
'com.databricks.spark.redshift'
).option(
'url',
'jdbc:redshift://prod.us-east-1.redshift.amazonaws.com:5439/db'
).option(
'user',
response['Connection']['ConnectionProperties']['USERNAME']
).option(
'password',
response['Connection']['ConnectionProperties']['PASSWORD']
).option(
'dbtable',
'db.table'
).option(
'tempdir',
's3://config/glue/temp/redshift/'
).option(
'forward_spark_s3_credentials', 'true'
).load()
推荐答案
AWS没有等同于Scala的组件可以发出此API调用.但是,您可以按照以下answer .
There is no scala equivalent from AWS to issue this API call.But you can use Java SDK code inside scala as mentioned in this answer.
This is the Java SDK call for getConnection
and if you don't want to do this then you can follow below approach:
-
创建AWS Glue python shell作业并检索连接信息.
Create AWS Glue python shell job and retrieve the connection information.
一旦有了值,就调用另一个scala Glue作业,并将它们作为参数作为python shell作业中的参数,如下所示:
Once you have the values then call the other scala Glue job with these as arguments inside your python shell job as shown below :
glue = boto3.client('glue',region_name ='us-east-1')
glue = boto3.client('glue', region_name='us-east-1')
response = glue.get_connection(
Name='name-of-embedded-connection',
HidePassword=False
)
response = client.start_job_run(
JobName = 'my_scala_Job',
Arguments = {
'--username': response['Connection']['ConnectionProperties']['USERNAME'],
'--password': response['Connection']['ConnectionProperties']['PASSWORD'] } )
- 然后使用getResolvedOptions在scala作业中访问这些参数,如下所示:
导入com.amazonaws.services.glue.util.GlueArgParser
import com.amazonaws.services.glue.util.GlueArgParser
val args = GlueArgParser.getResolvedOptions(
sysArgs, Array(
"username",
"password")
)
val user = args("username")
val pwd = args("password")
这篇关于使用Scala提取嵌入式AWS Glue连接凭证的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!