AWS Glue:关系化后,Rename_field()不起作用 [英] AWS Glue: Rename_field() does not work after relationalize

查看:61
本文介绍了AWS Glue:关系化后,Rename_field()不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我得到一份需要执行以下任务的工作

I got a job that needs to perform the following task

  1. 关联数据
  2. 重命名包含'.'的字段名称,以便可以将其作为普通的字段名称导入PostgreSQL.

这是代码

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "gluecatalog", table_name = "fcorders", transformation_ctx = "datasource0")
rootTableName = 'orders' 

dfc = Relationalize.apply(frame = datasource0, staging_path = "s3://my-bucket/temp/", name = rootTableName, transformation_ctx = "dfc")
dfc.keys()
for df_name in dfc.keys():
        m_df = dfc.select(df_name)
        print "Writing to Postgre table: ", df_name
        if (df_name <> rootTableName):
            renamefields4 = m_df.rename_field("SalesDeliveryLines.val.shipped.unitDisplayCode", "shipped_unitDisplayCode")
        else:
            renamefields4 = RenameField.apply(frame = m_df, old_name = "vehicle.sourceReccordUID", new_name = "vehicle_sourceReccordUID", transformation_ctx = "renamefields4")
        renamefields4.printSchema()

printSchema()将架构显示为不变.如果我写入数据库,则字段名称仍包含.".

The printSchema() displays the schema as unchanged. If I write to the database, the field names still contain '.'s.

如果在关联之前使用ApplyMapping.apply()更改字段名称,它将使子表消失.如果在关联后使用ApplyMapping.apply(),它只会删除名称中包含."的所有字段.

If I uses ApplyMapping.apply() to change the field name before relationalize, it makes the child table disappear. If I use ApplyMapping.apply() after relationalize it simply deletes all the fields whose name contains '.'.

最重要的是,无论我如何尝试,我都无法在同一作业中关联和重命名字段.

The bottom line is I cannot relationalize and rename field in the same job no matter what I try.

我错过了什么吗,或者这是AWS Glue的错误吗?

Did I miss something or is this a AWS Glue bug?

推荐答案

已确认 rename_field() RenameField.apply()的故障是胶水错误

It is confirmed the malfunction of rename_field() and RenameField.apply() is a Glue bug.

到目前为止,我的解决方法是将DynamicFrame转换为DataFrame->重命名字段DataFrame->将其转换回DynamicFrame.

The work-around I have so far is to convert DynamicFrame to DataFrame -> rename the fields DataFrame -> Convert it back to DynamicFrame.

这是代码

    new_df = m_df.toDF()
    print (type( new_df))
    for oldName in new_df.schema.names:
      new_df = new_df.withColumnRenamed(oldName, oldName.replace("SalesDeliveryLines.val.","").replace(".","_"))
    m_df = m_df.fromDF(new_df, glueContext, "m_df")

这篇关于AWS Glue:关系化后,Rename_field()不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆