Hadoop DistributedCache已弃用 - 首选API是什么? [英] Hadoop DistributedCache is deprecated - what is the preferred API?
问题描述
我的map任务需要一些配置数据,我希望通过Distributed Cache进行分发。
Hadoop MapReduce教程显示了使用率的分布式缓存类,大致如下:
/ /在驱动
JobConf conf = new JobConf(getConf(),WordCount.class);
...
DistributedCache.addCacheFile(new Path(filename).toUri(),conf);
//在映射器中
Path [] myCacheFiles = DistributedCache.getLocalCacheFiles(job);
...
然而, DistributedCache
在Hadoop 2.2.0中标记为已弃用。
实现此目的的首选方式是什么?是否有关于此API的最新示例或教程?
分布式缓存的API可以在Job类本身。在这里查看文档: http://hadoop.apache.org/docs /stable2/api/org/apache/hadoop/mapreduce/Job.html
代码应该类似于
Job job = new Job();
...
job.addCacheFile(new Path(filename).toUri());
在您的映射器代码中:
Path [] localPaths = context.getLocalCacheFiles();
...
My map tasks need some configuration data, which I would like to distribute via the Distributed Cache.
The Hadoop MapReduce Tutorial shows the usage of the DistributedCache class, roughly as follows:
// In the driver
JobConf conf = new JobConf(getConf(), WordCount.class);
...
DistributedCache.addCacheFile(new Path(filename).toUri(), conf);
// In the mapper
Path[] myCacheFiles = DistributedCache.getLocalCacheFiles(job);
...
However, DistributedCache
is marked as deprecated in Hadoop 2.2.0.
What is the new preferred way to achieve this? Is there an up-to-date example or tutorial covering this API?
The APIs for the Distributed Cache can be found in the Job class itself. Check the documentation here: http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html The code should be something like
Job job = new Job();
...
job.addCacheFile(new Path(filename).toUri());
In your mapper code:
Path[] localPaths = context.getLocalCacheFiles();
...
这篇关于Hadoop DistributedCache已弃用 - 首选API是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!