访问hadoop分布式缓存中的文件 [英] Accessing files in hadoop distributed cache

查看:143
本文介绍了访问hadoop分布式缓存中的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用分布式缓存来允许映射器访问数据。在主要的,我使用命令

  DistributedCache.addCacheFile(新的URI(/ user / peter / cacheFile / testCache1 ),conf); 

其中/ user / peter / cacheFile / testCache1是一个存在于hdfs中的文件



然后,我的设置函数如下所示:

  public void setup(Context context)抛出IOException,InterruptedException {
Configuration conf = context.getConfiguration();
Path [] localFiles = DistributedCache.getLocalCacheFiles(conf);
// etc
}

然而,这个localFiles数组总是为空。

我最初在单主机群集上运行测试,但我读到这会阻止分布式缓存正常工作。我尝试了伪分布式,但也没有工作



我正在使用hadoop 1.0.3



感谢
Peter

解决方案

这里的问题是我在做以下工作:
$ b

  Configuration conf = new Configuration(); 
工作职位=新职位(conf,wordcount);
DistributedCache.addCacheFile(新的URI(/ user / peter / cacheFile / testCache1),conf);

由于Job构造函数创建conf实例的内部副本,所以之后添加缓存文件不会影响事物。相反,我应该这样做:

  Configuration conf = new Configuration(); 
DistributedCache.addCacheFile(新的URI(/ user / peter / cacheFile / testCache1),conf);
工作职位=新职位(conf,wordcount);

现在它可以工作。感谢hadoop用户列表的帮助。


I want to use the distributed cache to allow my mappers to access data. In main, I'm using the command

DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);

Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs

Then, my setup function looks like this:

public void setup(Context context) throws IOException, InterruptedException{
    Configuration conf = context.getConfiguration();
    Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
    //etc
}

However, this localFiles array is always null.

I was initially running on a single-host cluster for testing, but I read that this will prevent the distributed cache from working. I tried with a pseudo-distributed, but that didn't work either

I'm using hadoop 1.0.3

thanks Peter

解决方案

Problem here was that I was doing the following:

Configuration conf = new Configuration();
Job job = new Job(conf, "wordcount");
DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);

Since the Job constructor makes an internal copy of the conf instance, adding the cache file afterwards doesn't affect things. Instead, I should do this:

Configuration conf = new Configuration();
DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);
Job job = new Job(conf, "wordcount");

And now it works. Thanks to Harsh on hadoop user list for the help.

这篇关于访问hadoop分布式缓存中的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆