可能是由于binaryFiles()在HDFS超过1百万个文件的火花timesout [英] spark timesout maybe due to binaryFiles() with more than 1 million files in HDFS

查看:1484
本文介绍了可能是由于binaryFiles()在HDFS超过1百万个文件的火花timesout的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我通过

val xmls = sc.binaryFiles(xmlDir)

操作本地运行正常,但纱线它失败:

The operation runs fine locally but on yarn it fails with:

 client token: N/A
 diagnostics: Application application_1433491939773_0012 failed 2 times due to ApplicationMaster for attempt appattempt_1433491939773_0012_000002 timed out. Failing the application.
 ApplicationMaster host: N/A
 ApplicationMaster RPC port: -1
 queue: default
 start time: 1433750951883
 final status: FAILED
 tracking URL: http://controller01:8088/cluster/app/application_1433491939773_0012
 user: ariskk
Exception in thread "main" org.apache.spark.SparkException: Application finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:622)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:647)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

在hadoops / userlogs日志我经常收到这些消息:

On hadoops/userlogs logs I am frequently getting these messages:

15/06/08 09:15:38 WARN util.AkkaUtils: Error sending message [message = Heartbeat(1,[Lscala.Tuple2;@2b4f336b,BlockManagerId(1, controller01.stratified, 58510))] in 2 attempts
java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:195)
at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:427)

我跑通过我的工作火花火花提交,它适用于仅包含37K文件的其他HDFS目录。任何想法如何解决这个问题?

I run my spark job via spark-submit and it works for an other HDFS directory that contains only 37k files. Any ideas how to resolve this?

推荐答案

好混得火花邮件列表一些帮助后,我发现有2个问题:

Ok after getting some help on sparks mailing list, I found out there were 2 issues:


  1. src目录下,如果给出/ MY_DIR /它使火花失败并创建心跳的问题。相反,它应该作为HDFS:/// MY_DIR / *

  1. the src directory, if it is given as /my_dir/ it makes spark fail and creates the heartbeat issues. Instead it should be given as hdfs:///my_dir/*

内存错误的一个出出现在日志中固定#1之后。这是运行内存纱运行火花司机因(显然它在内存中保存的所有文件信息)的文件数量。所以我火花submit'ed工作与--conf spark.driver.memory =8克它固定的问题。

An out of memory error appears in the logs after fixing #1. This is the spark driver running on yarn running out of memory due to the number of files (apparently it keeps all file info in memory). So I spark-submit'ed the job with --conf spark.driver.memory=8g which fixed the issue.

这篇关于可能是由于binaryFiles()在HDFS超过1百万个文件的火花timesout的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆