由于HDFS缓存jar,Spark作业失败 [英] Spark jobs failing because HDFS is caching jars

查看:141
本文介绍了由于HDFS缓存jar,Spark作业失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将Scala/Spark jars上传到HDFS以在我们的集群上对其进行测试.跑步后,我经常意识到需要进行一些更改.因此,我在本地进行更改,然后将新的jar推送回HDFS.但是,经常(并非总是如此),当我这样做时,hadoop会抛出一个错误,实质上是说这个jar与旧的jar(duh)不同.

I upload Scala / Spark jars to HDFS to test them on our cluster. After running, I frequently realize there are changes that need to be made. So I make the changes locally then push the new jar back up to HDFS. However, often (not always) when I do this, hadoop throws an error essentially saying that this jar is not the same as the old jar (duh).

我尝试清除垃圾箱",.staging和.sparkstaging目录,但这无济于事.我尝试重命名该jar,有时它会工作,而在其他时候却无法工作(我仍然不得不这样做,这仍然很荒谬).

I try clearing my Trash, .staging, and .sparkstaging directories but that doesn't do anything. I try renaming the jar, which will work sometimes and other times it won't (it's still ridiculous I have to do this in the first place).

有人知道这是为什么发生的,我如何防止它发生?谢谢你的帮助.如果有帮助,请查看以下日志(编辑了一些路径):

Does anyone know why this is occurring and how I can prevent it from occurring? Thanks for any help. Here are some logs if that helps (edited out some paths):

应用程序application_1475165877428_124781由于AM失败了2次 退出了appattempt_1475165877428_124781_000002的容器, exitCode:-1000有关更详细的输出,请检查应用程序跟踪 页面: http://examplelogsite/然后,单击指向每个日志的链接 试图.诊断:src上的资源MYJARPATH/EXAMPLE.jar更改 文件系统(预期为1475433291946,为1475433292850 java.io.IOException:资源MYJARPATH/EXAMPLE.jar在src上更改 文件系统(预期为1475433291946,位于1475433292850,位于 org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)位于 org.apache.hadoop.yarn.util.FSDownload.access $ 000(FSDownload.java:61) 在org.apache.hadoop.yarn.util.FSDownload $ 2.run(FSDownload.java:359) 在org.apache.hadoop.yarn.util.FSDownload $ 2.run(FSDownload.java:357) 在java.security.AccessController.doPrivileged(本机方法)在 javax.security.auth.Subject.doAs(Subject.java:422)在 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) 在org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356)处 org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)在 java.util.concurrent.FutureTask.run(FutureTask.java:266)在 java.util.concurrent.Executors $ RunnableAdapter.call(Executors.java:511) 在java.util.concurrent.FutureTask.run(FutureTask.java:266)在 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 在 java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617) 在java.lang.Thread.run(Thread.java:745)失败.失败 该应用程序.

Application application_1475165877428_124781 failed 2 times due to AM Container for appattempt_1475165877428_124781_000002 exited with exitCode: -1000 For more detailed output, check application tracking page:http://examplelogsite/ Then, click on links to logs of each attempt. Diagnostics: Resource MYJARPATH/EXAMPLE.jar changed on src filesystem (expected 1475433291946, was 1475433292850 java.io.IOException: Resource MYJARPATH/EXAMPLE.jar changed on src filesystem (expected 1475433291946, was 1475433292850 at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Failing this attempt. Failing the application.

推荐答案

根据您的日志,我确定它来自yarn端.
您可以自行修改yarn以跳过此异常,作为解决方法.
我遇到此线程导致错误日志changed on src filesystem,我遇到了此问题,并通过修改yarn src代码跳过了它.
有关更多详细信息,您可以参考 how-修复在src文件系统上更改资源的问题

According to your log, I'm sure it comes from yarn side.
You can modify yarn yourself to skip this exception as workaround.
I ran into this thread cause the error log changed on src filesystem, I met this issue and skipped it by modify yarn src code.
For more details, you can refer to how-to-fix-resource-changed-on-src-filesystem-issue

这篇关于由于HDFS缓存jar,Spark作业失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆