使用pyspark AWS胶水时显示DataFrame [英] display DataFrame when using pyspark aws glue

查看:144
本文介绍了使用pyspark AWS胶水时显示DataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何显示带有awl胶水作业的DataFrame?

how can I show the DataFrame with job etl of aws glue?

我在下面尝试了此代码,但未显示任何内容.

I tried this code below but doesn't display anything.

df.show()

代码

datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "flux-test", table_name = "tab1", transformation_ctx = "datasource0")
sourcedf = ApplyMapping.apply(frame = datasource0, mappings = [("id", "long", "id", "long"),("Rd.Id_Releve", "string", "Rd.Id_R", "string")])
 sourcedf = sourcedf.toDF()
 data = []
 schema = StructType(
[
    StructField('PM',
        StructType([
            StructField('Pf', StringType(),True),
            StructField('Rd', StringType(),True)
    ])
    ),
    ])
 cibledf = sqlCtx.createDataFrame(data, schema)
 cibledf = sqlCtx.createDataFrame(sourcedf.rdd.map(lambda x:    Row(PM=Row(Pf=str(x.id_prm), Rd=None ))), schema)
 print(cibledf.show())
 job.commit()

推荐答案

在粘合控制台中,运行粘合作业后,在作业列表中将有一个日志/错误日志"列.

In your glue console, after you run your glue job, in job listing there would be a column for Logs / Error logs.

单击日志,这将带您到与您的工作关联的cloudwatch日志.尽管浏览打印声明.

Click on the Logs and this would take you to the cloudwatch logs associated to your job. Browse though for the print statement.

也请在此处检查:将动态框架转换为数据框架并进行显示( )

添加的工作/测试代码示例

代码示例:

zipcode_dynamicframe = glueContext.create_dynamic_frame.from_catalog(
       database = "customer_db",
       table_name = "zipcode_master")
zipcode_dynamicframe.printSchema()
zipcode_dynamicframe.toDF().show(10)

cloudwatch日志中zipcode_dynamicframe.show()的屏幕截图:

Screenshot for zipcode_dynamicframe.show() in cloudwatch log:

这篇关于使用pyspark AWS胶水时显示DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆