通过分布式缓存访问Mapper中的文件 [英] Accesing file in Mapper through Distributed Cache

查看:101
本文介绍了通过分布式缓存访问Mapper中的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在Mapper中访问分布式文件的内容。以下是我编写的用于生成分布式缓存文件名称的代码。请帮助我访问文件的内容

  public class DistCacheExampleMapper扩展MapReduceBase实现Mapper< LongWritable,Text,Text,Text> 
{
Text a = new Text();
Path []日期=新路径[0];
public void configure(JobConf conf){

try {
dates = DistributedCache.getLocalCacheFiles(conf);
String astr = dates.toString();
a = new Text(astr);
$ b $ catch(IOException ioe){
System.err.println(Cached exception while getting cached files:+
StringUtils.stringifyException(ioe));
}


}

@Override
public void map(LongWritable key,Text value,OutputCollector< Text,Text> output,
Reporter记者)抛出IOException {

String line = value.toString(); (Path cacheFile:dates){

output.collect(new Text(line),new Text(cacheFile.getName()));



}



}


}

b $ b

解决方案

 列表< String []>线; 
Path [] files = new Path [0];

public void configure(JobConf conf){
lines = new ArrayList<>();
BufferedReader SW;
尝试{
files = DistributedCache.getLocalCacheFiles(conf);
SW = new BufferedReader(new FileReader(files [0] .toString()));
字符串行; ((line = SW.readLine())!= null){
lines.add(line.split(,)); //现在,每行条目都是一个String数组,每个元素都是一个列
}
SW.close();
$ b $ catch(IOException ioe){
System.err.println(Cached exception while getting cached files:+
StringUtils.stringifyException(ioe));


$ / code $ / pre

这样你就可以得到文件的内容(在这里是第一个文件)放在分布式缓存中的变量 lines 中。每个条目表示一个String数组,它由','分隔。因此,第一行的第一列是 lines.get(0)[0] ,第二行的第三行是 lines.get 1)[2] 等。


I want to access the contents of the distributed file in my Mapper. Below is the code I have written which generates the name of the file for Distributed Cache. Please help me accessing the contents of the file

   public class DistCacheExampleMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text >
     {
      Text a = new Text();
    Path[] dates = new Path[0];
    public void configure(JobConf conf) {

    try {
            dates = DistributedCache.getLocalCacheFiles(conf);
            String astr = dates.toString();
            a = new Text(astr);

          } catch (IOException ioe) {
            System.err.println("Caught exception while getting cached files: " +   
          StringUtils.stringifyException(ioe));
          }


    }

    @Override
    public void map(LongWritable key, Text value, OutputCollector<Text, Text> output, 
           Reporter reporter) throws IOException {

             String line = value.toString();

             for(Path cacheFile: dates){

                    output.collect(new Text(line), new Text(cacheFile.getName()));

                }



                }


            }

解决方案

Try this instead in your configure() method:

List<String []> lines; 
Path[] files = new Path[0];

public void configure(JobConf conf) {
    lines = new ArrayList<>();
    BufferedReader SW;
    try {
        files = DistributedCache.getLocalCacheFiles(conf);
        SW = new BufferedReader(new FileReader(files[0].toString()));
        String line;
        while ((line = SW.readLine()) != null) {
           lines.add(line.split(",")); //now, each lines entry is a String array, with each element being a column
        }
        SW.close();

    } catch (IOException ioe) {
        System.err.println("Caught exception while getting cached files: " +   
        StringUtils.stringifyException(ioe));
    }
}

This way, you will have the contents of the files (in this case the first file) in the Distributed Cache, in the variable lines. Each lines entry represent a String array, which is split by ','. So the first column of the first row is lines.get(0)[0], the third row of the second line is lines.get(1)[2], etc.

这篇关于通过分布式缓存访问Mapper中的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆