YARN中的分布式缓存功能 [英] Distributed Cache feature in YARN

查看：144 发布时间：2020/11/22 2:19:51 hadoop

本文介绍了YARN中的分布式缓存功能的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

当前，我正在使用MAP-REDUCE YARN框架.并在伪分布式模式下使用hadoop. 我想在这里使用分布式缓存"功能来添加一些要缓存的文件，并在我的地图功能中使用它.我该如何做到这一点.

Currently i am using MAP-REDUCE YARN framework. And using hadoop in pseudo distributed mode. I want to use "Distributed Cache" feature here to add some files to cache and use it in my map function. How can i achieve this.

推荐答案

如何将文件添加到分布式缓存:

使用hadoop选项

hadoop jar <application jar> <main class> <input> <output> -files <absolute path to distributed cache file>

使用分布式缓存API:

job.addCacheFile(uri);

hadoop -files选项或分布式缓存API将缓存文件复制到所有任务节点，并使其在执行期间可用于mapper/reduce.

hadoop -files option or Distributed cache API copies the cache files to all the task nodes and make it available for mapper/ reducer during execution.

如何访问分布式缓存:

重写Mapper/reducer中的设置方法，并从上下文中调用getCacheFiles. 下面的示例代码:

Override setup method in Mapper/ reducer and call getCacheFiles from context. Sample code below:

    @Override
    protected void setup(Context context)
            throws IOException, InterruptedException {

        Path[] localPaths = context.getCacheFiles();
        if (localPaths.length == 0) {
            throw new FileNotFoundException("Distributed cache file not found.");
        }
        File localFile = new File(localPaths[0].toString());
        // code to process cache file

    }

context.getCacheFiles方法返回在配置"中设置的文件的URI数组.

context.getCacheFiles method returns an URI array of the files set in the Configuration.

这篇关于YARN中的分布式缓存功能的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

YARN中的分布式缓存功能 [英] Distributed Cache feature in YARN

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

YARN中的分布式缓存功能 [英] Distributed Cache feature in YARN

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭