Spark 异常:写入行时任务失败 [英] Spark Exception : Task failed while writing rows

查看:39
本文介绍了Spark 异常:写入行时任务失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在阅读文本文件并将它们转换为镶木地板文件.我正在使用 spark 代码来做这件事.但是当我尝试运行代码时,我得到以下异常

I am reading text files and converting them to parquet files. I am doing it using spark code. But when i try to run the code I get following exception

org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 1.0 failed 4 times, most recent failure: Lost task 2.3 in stage 1.0 (TID 9, XXXX.XXX.XXX.local): org.apache.spark.SparkException: Task failed while writing rows.
    at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org$apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:191)
    at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:160)
    at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insert$1.apply(commands.scala:160)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
    at org.apache.spark.scheduler.Task.run(Task.scala:70)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArithmeticException: / by zero
    at parquet.hadoop.InternalParquetRecordWriter.initStore(InternalParquetRecordWriter.java:101)
    at parquet.hadoop.InternalParquetRecordWriter.<init>(InternalParquetRecordWriter.java:94)
    at parquet.hadoop.ParquetRecordWriter.<init>(ParquetRecordWriter.java:64)
    at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:282)
    at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:252)
    at org.apache.spark.sql.parquet.ParquetOutputWriter.<init>(newParquet.scala:83)
    at org.apache.spark.sql.parquet.ParquetRelation2$$anon$4.newInstance(newParquet.scala:229)
    at org.apache.spark.sql.sources.DefaultWriterContainer.initWriters(commands.scala:470)
    at org.apache.spark.sql.sources.BaseWriterContainer.executorSideSetup(commands.scala:360)
    at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org$apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$1(commands.scala:172)
    ... 8 more

我正在尝试以下列方式编写数据帧:

I am trying to write the dataframe in following fashion :

dataframe.write().parquet(Path)

非常感谢任何帮助.

推荐答案

另一个可能的原因是您达到了 s3 请求速率限制.如果您仔细查看您的日志,您可能会看到类似这样的内容

Another possible reason is that you're hitting s3 request rate limits. If you look closely at your logs you may see something like this

AmazonS3Exception:请降低您的请求率.

虽然 Spark UI 会说

While the Spark UI will say

写入行时任务失败

我怀疑这是您遇到问题的原因,但如果您正在执行一项高度密集的工作,这可能是一个原因.所以我只是为了答案的完整性.

I doubt its the reason you're getting an issue, but its a possible reason if you're running a highly intensive job. So I included just for answer's completeness.

这篇关于Spark 异常:写入行时任务失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆