Hadoop DistributedCache已弃用 - 首选API是什么? [英] Hadoop DistributedCache is deprecated - what is the preferred API?

查看:1453
本文介绍了Hadoop DistributedCache已弃用 - 首选API是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的map任务需要一些配置数据,我希望通过Distributed Cache进行分发。



Hadoop MapReduce教程显示了使用率的分布式缓存类,大致如下:

  / /在驱动
JobConf conf = new JobConf(getConf(),WordCount.class);
...
DistributedCache.addCacheFile(new Path(filename).toUri(),conf);

//在映射器中
Path [] myCacheFiles = DistributedCache.getLocalCacheFiles(job);
...

然而, DistributedCache 在Hadoop 2.2.0中标记为已弃用

实现此目的的首选方式是什么?是否有关于此API的最新示例或教程?

解决方案

分布式缓存的API可以在Job类本身。在这里查看文档: http://hadoop.apache.org/docs /stable2/api/org/apache/hadoop/mapreduce/Job.html
代码应该类似于

  Job job = new Job(); 
...
job.addCacheFile(new Path(filename).toUri());

在您的映射器代码中:

  Path [] localPaths = context.getLocalCacheFiles(); 
...


My map tasks need some configuration data, which I would like to distribute via the Distributed Cache.

The Hadoop MapReduce Tutorial shows the usage of the DistributedCache class, roughly as follows:

// In the driver
JobConf conf = new JobConf(getConf(), WordCount.class);
...
DistributedCache.addCacheFile(new Path(filename).toUri(), conf); 

// In the mapper
Path[] myCacheFiles = DistributedCache.getLocalCacheFiles(job);
...

However, DistributedCache is marked as deprecated in Hadoop 2.2.0.

What is the new preferred way to achieve this? Is there an up-to-date example or tutorial covering this API?

解决方案

The APIs for the Distributed Cache can be found in the Job class itself. Check the documentation here: http://hadoop.apache.org/docs/stable2/api/org/apache/hadoop/mapreduce/Job.html The code should be something like

Job job = new Job();
...
job.addCacheFile(new Path(filename).toUri());

In your mapper code:

Path[] localPaths = context.getLocalCacheFiles();
...

这篇关于Hadoop DistributedCache已弃用 - 首选API是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆