如何在分布式缓存中使用MapReduce输出 [英] How to use a MapReduce output in Distributed Cache

查看：97 发布时间：2020/5/5 15:53:29 hadoop mapreduce distributed-cache

本文介绍了如何在分布式缓存中使用MapReduce输出的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我有一个MapReduce作业正在创建输出文件part-00000，并且在完成此作业后还有一个作业正在运行.

Lets say i have a MapReduce Job which is creating an output file part-00000 and there is one more job running after the completion of this job.

如何将分布式缓存中第一个作业的输出文件用于第二个作业.

How can i use the output file of the first job in the Distributed cache for the second job.

推荐答案

以下步骤可能会对您有所帮助，

The below steps might help you,

将第一个作业的输出目录路径传递给第二个作业的驱动程序课.

Pass the first job's output directory path to the Second job's Driver class.

使用路径过滤器列出以part-*开头的文件.请参考下面的代码片段作为您的第二个作业驱动程序类，

Use Path Filter to list files starts with part-*. Refer the below code snippet for your second job driver class,

    FileSystem fs = FileSystem.get(conf);
    FileStatus[] fileList = fs.listStatus(new Path("1st job o/p path") , 
                               new PathFilter(){
                                     @Override public boolean accept(Path path){
                                            return path.getName().startsWith("part-");
                                     } 
                                } );

遍历每个part-*文件，并将其添加以分发缓存.

Iterate over every part-* file and add it to distribute cache.

    for(int i=0; i < fileList.length;i++){ 
             DistributedCache.addCacheFile(new URI(fileList[i].getPath().toUri()));
    }

这篇关于如何在分布式缓存中使用MapReduce输出的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在分布式缓存中使用MapReduce输出 [英] How to use a MapReduce output in Distributed Cache

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在分布式缓存中使用MapReduce输出 [英] How to use a MapReduce output in Distributed Cache

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭