作业期间更改了Hadoop分布式缓存对象 [英] Hadoop Distributed cache object changed during job

查看:71
本文介绍了作业期间更改了Hadoop分布式缓存对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有许多顺序的Hadoop作业,其中需要一个DistributedCache文件.

I have a number of sequenced Hadoop jobs in which I need a DistributedCache file.

驱动程序类(控制器)从上一个作业接收输入,修改文件,将其放置在DistributedCache中,然后开始新的作业.

The driver class(Controller) receives the input from the previous job, modifies a file, it places it in the DistributedCache and it starts a new job.

第一份工作(即第二份工作)后,出现此错误:

After the first job (i.e. in the second job), I get this error:

java.io.IOException: 
The distributed cache object hdfs://xxxx/xx/x/modelfile2#modelfile2 
changed during the job from 11/8/12 11:55 PM to 11/8/12 11:55 PM

有人知道可能是什么问题吗?

Does anyone know what the problem might be ?

推荐答案

根据TrackerDistributedCacheManager.java方法downloadCacheObject中的源,发生此异常时不会忽略它,并且不会发生将文件从HDFS实际下载到本地文件系统的情况.因此,任务将不会在分布式缓存中找到其文件. 我会怀疑您可能两次注册了同一对象,或者,当多个作业将来自同一控制器的具有相同邮件的文件放入同一缓存中时,Hadoop中可能会出现一些错误.

According to sources in TrackerDistributedCacheManager.java method downloadCacheObject when this exception happens it is not ignored and the actual download of file from HDFS to local file system is not happens. So task will not find its file in the distributed cache. I would suspect that you may be register the same object twice, or, there might be some bug in hadoop when several jobs put file with the same mail in the distributed cache from the same controller.

这篇关于作业期间更改了Hadoop分布式缓存对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆