Spark 动态帧显示方法没有任何结果 [英] Spark dynamic frame show method yields nothing

查看:31
本文介绍了Spark 动态帧显示方法没有任何结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我使用 AWS Glue 自动生成的代码从 S3 读取 csv 文件并通过 JDBC 连接将其写入表.看起来很简单,Job 成功运行,没有错误,但它什么也没写.当我检查 Glue Spark 动态框架时,它会包含所有行(使用 .count()).但是当对它执行 .show() 时,它什么也没产生.

So I am using AWS Glue auto-generated code to read csv file from S3 and write it to a table over a JDBC connection. Seems simple, Job runs successfully with no error but it writes nothing. When I checked the Glue Spark Dynamic Frame it does contents all the rows (using .count()). But when do a .show() on it yields nothing.

.printSchema() 工作正常.尝试在使用 .show() 时记录错误,但没有错误或没有打印任何内容.使用 .toDF 及其工作的 show 方法将 DynamicFrame 转换为数据框.我认为文件有问题,试图缩小到某些列.但即使文件中只有 2 列也是一样.用双引号明确标记字符串,仍然没有成功.

.printSchema() works fine. Tried logging the error while using .show(), but no errors or nothing is printed. Converted the DynamicFrame to the data frame using .toDF and the show method it works. I thought there is some problem with the file, trying to narrow to certain columns. But even with just 2 columns in the file same thing. Clearly marked string in double quotes, still no success.

我们需要从 Glue 配置中选择 JDBC 连接之类的东西.我猜常规的火花数据框做不到.因此需要动态框架工作.

We have things like JDBC connection that needs to be picked from Glue configuration. Which I guess regular spark data frame can't do. Hence need dynamic frame working.

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue.dynamicframe import DynamicFrame
import logging
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)

glueContext = GlueContext(SparkContext.getOrCreate())
spark = glueContext.spark_session

datasource0 = glueContext.create_dynamic_frame.from_options('s3', {'paths': ['s3://bucket/file.csv']}, 'csv', format_options={'withHeader': True,'skipFirst': True,'quoteChar':'"','escaper':'\\'})

datasource0.printSchema()
datasource0.show(5)

输出

root
|-- ORDERID: string
|-- EVENTTIMEUTC: string

这是转换为常规数据帧的结果.

Here is what the converting to regular data frame yields.

datasource0.toDF().show()

输出

+-------+-----------------+
|ORDERID|     EVENTTIMEUTC|
+-------+-----------------+
|      2| "1/13/2018 7:50"|
|      3| "1/13/2018 7:50"|
|      4| "1/13/2018 7:50"|
|      5| "1/13/2018 7:50"|
|      6| "1/13/2018 8:52"|
|      7| "1/13/2018 8:52"|
|      8| "1/13/2018 8:53"|
|      9| "1/13/2018 8:53"|
|     10| "1/16/2018 1:33"|
|     11| "1/16/2018 2:28"|
|     12| "1/16/2018 2:37"|
|     13| "1/17/2018 1:17"|
|     14| "1/17/2018 2:23"|
|     15| "1/17/2018 4:33"|
|     16| "1/17/2018 6:28"|
|     17| "1/17/2018 6:28"|
|     18| "1/17/2018 6:36"|
|     19| "1/17/2018 6:38"|
|     20| "1/17/2018 7:26"|
|     21| "1/17/2018 7:28"|
+-------+-----------------+
only showing top 20 rows

这是一些数据.

ORDERID, EVENTTIMEUTC
1, "1/13/2018 7:10"
2, "1/13/2018 7:50"
3, "1/13/2018 7:50"
4, "1/13/2018 7:50"
5, "1/13/2018 7:50"
6, "1/13/2018 8:52"
7, "1/13/2018 8:52"
8, "1/13/2018 8:53"
9, "1/13/2018 8:53"
10, "1/16/2018 1:33"
11, "1/16/2018 2:28"
12, "1/16/2018 2:37"
13, "1/17/2018 1:17"
14, "1/17/2018 2:23"
15, "1/17/2018 4:33"
16, "1/17/2018 6:28"
17, "1/17/2018 6:28"
18, "1/17/2018 6:36"
19, "1/17/2018 6:38"
20, "1/17/2018 7:26"
21, "1/17/2018 7:28"
22, "1/17/2018 7:29"
23, "1/17/2018 7:46"
24, "1/17/2018 7:51"
25, "1/18/2018 2:22"
26, "1/18/2018 5:48"
27, "1/18/2018 5:50"
28, "1/18/2018 5:50"
29, "1/18/2018 5:51"
30, "1/18/2018 5:53"
100, "1/18/2018 10:32"
101, "1/18/2018 10:33"
102, "1/18/2018 10:33"
103, "1/18/2018 10:42"
104, "1/18/2018 10:59"
105, "1/18/2018 11:16"

推荐答案

我们在使用 Glue ETL 时遇到了类似的问题.要打印动态框架,您可以使用以下两个选项之一:

We faced similar issue while working with Glue ETL. To print the dynamic frame you can use one of the following two options :

print datasource0.show()

datasource0.toDF().show()

注意,如果你想直接打印动态帧内容,你需要额外的print关键字.

Note that, you need that extra print keyword if you want to directly print the dynamic frame contents.

这篇关于Spark 动态帧显示方法没有任何结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆