AWS Glue-插入之前截断目标postgres表 [英] AWS Glue - Truncate destination postgres table prior to insert
本文介绍了AWS Glue-插入之前截断目标postgres表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我试图在插入之前截断postgres目标表,并且一般来说,我试图利用GLUE中已经创建的连接来触发外部函数。
I am trying to truncate a postgres destination table prior to insert, and in general, trying to fire external functions utilizing the connections already created in GLUE.
有人能这样做吗?
推荐答案
我曾尝试过 DROP / TRUNCATE
方案,但无法使用已经在Glue中创建的连接来实现,而只能使用纯Python PostgreSQL驱动程序 pg8000 。
I've tried the DROP/ TRUNCATE
scenario, but have not been able to do it with connections already created in Glue, but with a pure Python PostgreSQL driver, pg8000.
- 下载来自pypi的pg8000
- 在根文件夹中创建一个空的
__ init __。py
- 压缩内容&上传到S3
- 引用作业的
Python库路径
中的zip文件 - 将数据库连接详细信息设置为作业参数(确保在所有键名前添加
-
)。勾选服务器端加密框。
- Download the tar of pg8000 from pypi
- Create an empty
__init__.py
in the root folder - Zip up the contents & upload to S3
- Reference the zip file in the
Python lib path
of the job - Set the DB connection details as job params (make sure to prepend all key names with
--
). Tick the "Server-side encryption" box.
然后,您可以简单地创建连接并执行SQL。
Then you can simply create a connection and execute SQL.
import sys
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.dynamicframe import DynamicFrame
from awsglue.job import Job
import pg8000
args = getResolvedOptions(sys.argv, [
'JOB_NAME',
'PW',
'HOST',
'USER',
'DB'
])
# ...
# Create Spark & Glue context
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
# ...
config_port = 5432
conn = pg8000.connect(
database=args['DB'],
user=args['USER'],
password=args['PW'],
host=args['HOST'],
port=config_port
)
query = "TRUNCATE TABLE {0};".format(".".join([schema, table]))
cur = conn.cursor()
cur.execute(query)
conn.commit()
cur.close()
conn.close()
这篇关于AWS Glue-插入之前截断目标postgres表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文