访问hadoop分布式缓存中的文件 [英] Accessing files in hadoop distributed cache
问题描述
我想使用分布式缓存来允许映射器访问数据。在主要的,我使用命令
DistributedCache.addCacheFile(新的URI(/ user / peter / cacheFile / testCache1 ),conf);
其中/ user / peter / cacheFile / testCache1是一个存在于hdfs中的文件
然后,我的设置函数如下所示:
public void setup(Context context)抛出IOException,InterruptedException {
Configuration conf = context.getConfiguration();
Path [] localFiles = DistributedCache.getLocalCacheFiles(conf);
// etc
}
然而,这个localFiles数组总是为空。
我最初在单主机群集上运行测试,但我读到这会阻止分布式缓存正常工作。我尝试了伪分布式,但也没有工作
我正在使用hadoop 1.0.3
感谢
Peter
这里的问题是我在做以下工作:
$ b
Configuration conf = new Configuration();
工作职位=新职位(conf,wordcount);
DistributedCache.addCacheFile(新的URI(/ user / peter / cacheFile / testCache1),conf);
由于Job构造函数创建conf实例的内部副本,所以之后添加缓存文件不会影响事物。相反,我应该这样做:
Configuration conf = new Configuration();
DistributedCache.addCacheFile(新的URI(/ user / peter / cacheFile / testCache1),conf);
工作职位=新职位(conf,wordcount);
现在它可以工作。感谢hadoop用户列表的帮助。
I want to use the distributed cache to allow my mappers to access data. In main, I'm using the command
DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);
Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs
Then, my setup function looks like this:
public void setup(Context context) throws IOException, InterruptedException{
Configuration conf = context.getConfiguration();
Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
//etc
}
However, this localFiles array is always null.
I was initially running on a single-host cluster for testing, but I read that this will prevent the distributed cache from working. I tried with a pseudo-distributed, but that didn't work either
I'm using hadoop 1.0.3
thanks Peter
Problem here was that I was doing the following:
Configuration conf = new Configuration();
Job job = new Job(conf, "wordcount");
DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);
Since the Job constructor makes an internal copy of the conf instance, adding the cache file afterwards doesn't affect things. Instead, I should do this:
Configuration conf = new Configuration();
DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);
Job job = new Job(conf, "wordcount");
And now it works. Thanks to Harsh on hadoop user list for the help.
这篇关于访问hadoop分布式缓存中的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!