将 CSV 导入 Spark DataFrame 时出现 java.io.StreamCorruptedException [英] java.io.StreamCorruptedException when importing a CSV to a Spark DataFrame

查看：53 发布时间：2021/6/24 20:44:15 apache-spark pyspark pyspark-sql

本文介绍了将 CSV 导入 Spark DataFrame 时出现 java.io.StreamCorruptedException的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在 standalone 模式下运行 Spark 集群.Master 和 Worker 节点均可访问，并在 Spark Web UI 中提供日志.

I'm running a Spark cluster in standalone mode. Both Master and Worker nodes are reachable, with logs in the Spark Web UI.

我正在尝试将数据加载到 PySpark 会话中，以便我可以处理 Spark DataFrame.

I'm trying to load data into a PySpark session so I can work on Spark DataFrames.

以下几个示例(其中一个来自官方文档)，我尝试使用不同的方法，但都以相同的错误失败.例如

Following several examples (among them, one from the official doc), I tried using different methods, all failing with the same error. Eg

from pyspark.conf import SparkConf
from pyspark.context import SparkContext
from pyspark.sql import SQLContext

conf = SparkConf().setAppName('NAME').setMaster('spark://HOST:7077')
sc = SparkContext(conf=conf)
spark = SparkSession.builder.getOrCreate()

# a try
df = spark.read.load('/path/to/file.csv', format='csv', sep=',', header=True)

# another try
sql_ctx = SQLContext(sc)
df = sql_ctx.read.csv('/path/to/file.csv', header=True)

# and a few other tries...

每次都出现同样的错误:

Every time, I get the same error:

Py4JJavaError:调用 o81.csv 时发生错误.:

Py4JJavaError: An error occurred while calling o81.csv. :

org.apache.spark.SparkException:由于阶段失败，作业中止:阶段 0.0 中的任务 0 失败 4 次，最近失败:丢失任务 0.3在阶段 0.0 (TID 3, 192.168.X.X, executor 0):

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, 192.168.X.X, executor 0):

java.io.StreamCorruptedException:无效的流标头:0000000B

java.io.StreamCorruptedException: invalid stream header: 0000000B

我正在从 JSON 和 CSV 加载数据(当然要适当调整方法调用)，每次都出现相同的错误.

I'm loading data from JSON and CSV (tweaking the methods calls appropriately of course), the error is the same for both, every time.

有人明白这是什么问题吗?

Does someone understand what is the problem?

将 CSV 导入 Spark DataFrame 时出现 java.io.StreamCorruptedException [英] java.io.StreamCorruptedException when importing a CSV to a Spark DataFrame

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

将 CSV 导入 Spark DataFrame 时出现 java.io.StreamCorruptedException [英] java.io.StreamCorruptedException when importing a CSV to a Spark DataFrame

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭