如何刷新Hadoop分布式缓存? [英] How to flush Hadoop Distributed Cache?

查看:49
本文介绍了如何刷新Hadoop分布式缓存?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经使用DistributedCache.addFileToClassPath(Path file,Configuration conf)方法将一组jar添加到分布式缓存中,以使依赖项可用于跨集群的映射减少作业.现在,我想从缓存中删除所有这些jar,以开始清理,并确保在那里有正确的jar版本.我注释掉了将文件添加到缓存中的代码,并且还从将它们复制到hdfs中的位置中删除了这些代码.问题在于,由于映射归约作业未引发ClassNotFound异常,因此jar仍显示在类路径中.有没有一种方法可以刷新此缓存而无需重新启动任何服务?

I have added a set of jars to the Distributed Cache using the DistributedCache.addFileToClassPath(Path file, Configuration conf) method to make the dependencies available to a map reduce job across the cluster. Now I would like to remove all those jars from the cache to start clean and be sure I have the right jar versions there. I commented out the code that adds the files to the cache and also removed them from where I had copied them in hdfs. The problem is the jars still appear to be in the classpath because the map reduce job is not throwing ClassNotFound exceptions. Is there a way to flush this cache without restarting any services?

随后我刷新了以下文件夹:/var/lib/hadoop-hdfs/cache/mapred/mapred/local/taskTracker/distcache/.那没有解决.作业仍会找到引用.

Subsequently I flushed the following folder: /var/lib/hadoop-hdfs/cache/mapred/mapred/local/taskTracker/distcache/ . That did not solve it. The job still finds the references.

推荐答案

我现在了解了我的问题所在.我之前已将罐子复制到/usr/lib/hadoop/lib/文件夹中.这使得它们永久可用于地图缩减工作.从那里删除它们之后,该作业引发了预期的ClassNotFoundException.另外,我注意到,如果不使用addFileToClassPath添加罐子,则这些罐子将无法用于作业.因此,无需刷新分布式缓存或删除使用addFileToClassPath添加的内容,因为您放置在其中的内容仅对指定的作业实例可见.

I now understand what my problem was. I had previously copied the jars into the /usr/lib/hadoop/lib/ folder. That made them permanently available to the map reduce job. After removing them from there, the job threw the expected ClassNotFoundException. Also, I noticed that if I do not add the jars with addFileToClassPath they are not available to the job. So there is no need to flush the Distributed Cache or to remove what you have added with addFileToClassPath because what you put there is visible only to that specify job instance.

这篇关于如何刷新Hadoop分布式缓存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆