使用 AWS Glue 时如何在 postgres 中将 String 保存为 JSONB 类型 [英] How to save String as JSONB type in postgres when using AWS Glue

查看:25
本文介绍了使用 AWS Glue 时如何在 postgres 中将 String 保存为 JSONB 类型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找有关如何在 postgresql 中将字符串编写为 jsonb 类型的解决方案.所以 DynamicFrame 有一个字符串列,用于保存 json 数据.尝试保存到 postgres 时

I'm seeking a solution on how to write string as jsonb type in postgresql. So DynamicFrame has a string column that holds json data. When trying to save to postgres

DataSink0 = glueContext.write_dynamic_frame.from_catalog(frame = Transform0, database = "cms", table_name = "cms_public_listings", transformation_ctx = "DataSink0")

我收到以下错误:

遇到错误:

An error occurred while calling o1623.pyWriteDynamicFrame.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 134.0 failed 4 times, most recent failure: Lost task 0.3 in stage 134.0 (TID 137, ip-172-31-27-18.ec2.internal, executor 24): java.sql.BatchUpdateException: Batch entry 0 INSERT INTO "public".listings ([REMOVED_COLUMNS]) VALUES ([REMOVED_VALUES]) was aborted: ERROR: column "schema" is of type jsonb but expression is of type character varying
  Hint: You will need to rewrite or cast the expression.
  Position: 207  Call getNextException to see other errors in the batch.

我无法更改架构以保存字符串,因此我要么使用 AWS Glue ETL,要么必须制作 Python Shell 作业.我更愿意找到一种将 PySpark 与 AWS Glue 结合使用的方法.

I can't change the schema to hold a string, so it is either I use AWS Glue ETL or would have to craft Python Shell Job. I would prefer to find a way to use PySpark with AWS Glue.

推荐答案

我更喜欢使用原生 spark dataframe,因为它允许我进行更多的自定义.我可以使用 stringtype 属性从 dataframe 中转换 json 字段到表中的 jsonb 字段.对于这种情况,我的数据框有两个字段.

I prefer to use native spark dataframe, because it allows me more customization.I can use stringtype property to cast json field from dataframe to jsonb field in the table. For this case, my dataframe has two fields.

from pyspark import SparkConf

sc = SparkContext.getOrCreate(SparkConf())
spark = SparkSession(sc)

df = spark.read.format('csv') \
               .option('delimiter','|') \
               .option('header','True') \
               .load('your_path') 

##some transformation...

url = 'jdbc:postgresql://your_host:5432/your_databasename'
properties = {'user':'*****',
              'password':'*****',
              'driver': "org.postgresql.Driver",
              'stringtype':"unspecified"}
        
df.write.jdbc(url=url, table='your_tablename', mode='append', properties=properties)

在执行上述脚本之前,您应该在 postgresql 中创建表,因为属性 mode 设置为 append.如下:

Before to execute the above script, you should create the table in postgresql, because the property mode is setted as append. This as follow:

create table your_tablename
(
    my_json_field jsonb,
    another_field int
)

这篇关于使用 AWS Glue 时如何在 postgres 中将 String 保存为 JSONB 类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆