Spark动态框架显示方法无济于事 [英] Spark dynamic frame show method yields nothing

查看:59
本文介绍了Spark动态框架显示方法无济于事的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,我正在使用AWS Glue自动生成的代码从S3读取csv文件,并将其通过JDBC连接写入表中.看起来很简单,Job成功运行,没有任何错误,但是它什么也没写.当我检查Glue Spark动态框架时,它会显示所有行(使用.count()).但是当在其上执行.show()时,不会产生任何结果.

So I am using AWS Glue auto-generated code to read csv file from S3 and write it to a table over a JDBC connection. Seems simple, Job runs successfully with no error but it writes nothing. When I checked the Glue Spark Dynamic Frame it does contents all the rows (using .count()). But when do a .show() on it yields nothing.

.printSchema()可以正常工作.尝试在使用.show()时记录错误,但没有错误或未打印任何内容.使用.toDF及其有效的show方法将DynamicFrame转换为数据帧. 我认为该文件存在一些问题,试图缩小到某些列.但是即使文件中只有2列,也是一样.用双引号清楚标记字符串,仍然没有成功.

.printSchema() works fine. Tried logging the error while using .show(), but no errors or nothing is printed. Converted the DynamicFrame to the data frame using .toDF and the show method it works. I thought there is some problem with the file, trying to narrow to certain columns. But even with just 2 columns in the file same thing. Clearly marked string in double quotes, still no success.

我们需要从Glue配置中选择诸如JDBC连接之类的东西.我想常规的Spark数据框架无法做到这一点.因此需要动态框架.

We have things like JDBC connection that needs to be picked from Glue configuration. Which I guess regular spark data frame can't do. Hence need dynamic frame working.

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue.dynamicframe import DynamicFrame
import logging
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)

glueContext = GlueContext(SparkContext.getOrCreate())
spark = glueContext.spark_session

datasource0 = glueContext.create_dynamic_frame.from_options('s3', {'paths': ['s3://bucket/file.csv']}, 'csv', format_options={'withHeader': True,'skipFirst': True,'quoteChar':'"','escaper':'\\'})

datasource0.printSchema()
datasource0.show(5)

输出

root
|-- ORDERID: string
|-- EVENTTIMEUTC: string

这是转换为常规数据帧所产生的结果.

Here is what the converting to regular data frame yields.

datasource0.toDF().show()

输出

+-------+-----------------+
|ORDERID|     EVENTTIMEUTC|
+-------+-----------------+
|      2| "1/13/2018 7:50"|
|      3| "1/13/2018 7:50"|
|      4| "1/13/2018 7:50"|
|      5| "1/13/2018 7:50"|
|      6| "1/13/2018 8:52"|
|      7| "1/13/2018 8:52"|
|      8| "1/13/2018 8:53"|
|      9| "1/13/2018 8:53"|
|     10| "1/16/2018 1:33"|
|     11| "1/16/2018 2:28"|
|     12| "1/16/2018 2:37"|
|     13| "1/17/2018 1:17"|
|     14| "1/17/2018 2:23"|
|     15| "1/17/2018 4:33"|
|     16| "1/17/2018 6:28"|
|     17| "1/17/2018 6:28"|
|     18| "1/17/2018 6:36"|
|     19| "1/17/2018 6:38"|
|     20| "1/17/2018 7:26"|
|     21| "1/17/2018 7:28"|
+-------+-----------------+
only showing top 20 rows

这是一些数据.

ORDERID, EVENTTIMEUTC
1, "1/13/2018 7:10"
2, "1/13/2018 7:50"
3, "1/13/2018 7:50"
4, "1/13/2018 7:50"
5, "1/13/2018 7:50"
6, "1/13/2018 8:52"
7, "1/13/2018 8:52"
8, "1/13/2018 8:53"
9, "1/13/2018 8:53"
10, "1/16/2018 1:33"
11, "1/16/2018 2:28"
12, "1/16/2018 2:37"
13, "1/17/2018 1:17"
14, "1/17/2018 2:23"
15, "1/17/2018 4:33"
16, "1/17/2018 6:28"
17, "1/17/2018 6:28"
18, "1/17/2018 6:36"
19, "1/17/2018 6:38"
20, "1/17/2018 7:26"
21, "1/17/2018 7:28"
22, "1/17/2018 7:29"
23, "1/17/2018 7:46"
24, "1/17/2018 7:51"
25, "1/18/2018 2:22"
26, "1/18/2018 5:48"
27, "1/18/2018 5:50"
28, "1/18/2018 5:50"
29, "1/18/2018 5:51"
30, "1/18/2018 5:53"
100, "1/18/2018 10:32"
101, "1/18/2018 10:33"
102, "1/18/2018 10:33"
103, "1/18/2018 10:42"
104, "1/18/2018 10:59"
105, "1/18/2018 11:16"

推荐答案

在使用Glue ETL时,我们遇到了类似的问题.要打印动态框架,可以使用以下两个选项之一:

We faced similar issue while working with Glue ETL. To print the dynamic frame you can use one of the following two options :

print datasource0.show()

OR

datasource0.toDF().show()

请注意,如果要直接打印动态框架内容,则需要额外的print关键字.

Note that, you need that extra print keyword if you want to directly print the dynamic frame contents.

这篇关于Spark动态框架显示方法无济于事的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆