如何在map-reduce中读取多个图像文件作为hdfs的输入？ [英] How to read multiple image files as input from hdfs in map-reduce?

查看：78 发布时间：2018/5/31 18:47:43 java hadoop mapreduce

本文介绍了如何在map-reduce中读取多个图像文件作为hdfs的输入？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

  private static String [] testFiles = new String [] {img01.JPG，img02.JPG，img03.JPG，img04.JPG，img06 .JPG， img07.JPG， img05.JPG}; 
 // private static String testFilespath =/ home / student / Desktop / images; 
 private static String testFilespath =hdfs：// localhost：54310 / user / root / images; 
 // private static String indexpath =/ home / student / Desktop / indexDemo; 
 private static String testExtensive =/ home / student / Desktop / images; 
 
 public static class MapClass extends MapReduceBase 
 implements Mapper< Text，Text，Text，Text> {
 private text input_image = new Text（）; 
 private文本input_vector = new Text（）; 
 @Override 
 public void map（Text key，Text value，OutputCollector< Text，Text> output，Reporter reporter）throws IOException {
 
 System.out.println（CorrelogramIndex方法：）; 
字符串featureString; 
 int MAXIMUM_DISTANCE = 16; 
 AutoColorCorrelogram.Mode mode = AutoColorCorrelogram.Mode.FullNeighbourhood; 
 for（String identifier：testFiles）{
 try（FileInputStream fis = new FileInputStream（testFilespath +/+ identifier））{
 // Document doc = builder.createDocument（fis，identifier ）; 
 // FileInputStream imageStream = new FileInputStream（testFilespath +/+ identifier）; 
 BufferedImage bimg = ImageIO.read（fis）; 
 AutoColorCorrelogram vd = new AutoColorCorrelogram（MAXIMUM_DISTANCE，mode）; 
 vd.extract（bimg）; 
 featureString = vd.getStringRepresentation（）; 
 double [] bytearray = vd.getDoubleHistogram（）; 
 System.out.println（image：+ identifier ++ featureString）; 
 
} 
 System.out.println（-------------）; 
 input_image.set（identifier）; 
 input_vector.set（featureString）; 
 output.collect（input_image，input_vector）; 
 
 
 
 
 $ b public static class Reduce extends MapReduceBase 
 implements Reducer< Text，Text，Text，Text> {
 
 @Override 
 public void reduce（Text key，Iterator< Text> values，
 OutputCollector< Text，Text> output，
 Reporter reporter）throws IOException { 
 String out_vector =; 
 
 while（values.hasNext（））{
 out_vector.concat（values.next（）。toString（））; 
} 
 output.collect（key，new Text（out_vector））; 
 
 
 
 static int printUsage（）{
 System.out.println（image_mapreduce [-m< maps>] [-r< reduced> ;]< input>< output>）; 
 ToolRunner.printGenericCommandUsage（System.out）; 
返回-1; 
 
 
 $ b @Override 
 public int run（String [] args）throws Exception {
 JobConf conf = new JobConf（getConf（），image_mapreduce 。类）; 
 conf.setJobName（image_mapreduce）; 
 
 //键是单词（字符串）
 conf.setOutputKeyClass（Text.class）; 
 //这些值是计数（整数）
 conf.setOutputValueClass（Text.class）; 
 
 conf.setMapperClass（MapClass.class）; 
 // conf.setCombinerClass（Reduce.class）; 
 conf.setReducerClass（Reduce.class）; 
 
列表< String> other_args = new ArrayList< String>（）; 
 for（int i = 0; i< args.length; ++ i）{
 try {
 if（-m.equals（args [i]））{
 conf.setNumMapTasks（Integer.parseInt（args [++ i]））; 
} else if（-r.equals（args [i]））{
 conf.setNumReduceTasks（Integer.parseInt（args [++ i]））; 
} else {
 other_args.add（args [i]）; 
} 
} catch（NumberFormatException除外）{
 System.out.println（错误：预期为整数而不是+ args [i]）; 
 return printUsage（）; 
} catch（ArrayIndexOutOfBoundsException除外）{
 System.out.println（错误：必需的参数缺少来自+ 
 args [i-1]）; 
 return printUsage（）; 
} 
} 
 
 
 
 FileInputFormat.setInputPaths（conf，other_args.get（0））; 
 //FileInputFormat.setInputPaths(conf,new Path（hdfs：// localhost：54310 / user / root / images））; 
 FileOutputFormat.setOutputPath（conf，new Path（other_args.get（1）））; 
 
 JobClient.runJob（conf）; 
返回0; 
 
 
 $ b public static void main（String [] args）throws Exception {
 int res = ToolRunner.run（new Configuration（），new image_mapreduce（） ，args）; 
 System.exit（res）; 
} 
 
}

`我正在编写一个程序，将多个图像文件作为输入，存储在hdfs&提取地图功能中的特征。我如何指定在FileInputStream中读取图像的路径（一些参数）？或者有什么方法可以读取多个图像文件？

我想要做的是：
- 以hdfs中的多个图像文件作为输入
- 提取地图函数中的特征。
- 迭代地进行减少。
请帮助我在代码或更好的方式来做到这一点。

解决方案

使用 HIPI库 - 它将图像集合存储到ImageBundle中（将HDFS中的单个图像文件存储起来效率更高）。他们也有几个例子。

至于你的代码，你需要指定你打算使用的输入和输出格式。没有当前的输入格式来传递整个文件，但是您可以扩展FileInputFormat并创建一个能够发出< Text，BytesWritable> 对的RecordReader，其中关键是文件名和值是图像文件的字节。

实际上 Hadoop - 权威指南 ：

private static String[] testFiles = new String[]     {"img01.JPG","img02.JPG","img03.JPG","img04.JPG","img06.JPG","img07.JPG","img05.JPG"};
 // private static String testFilespath = "/home/student/Desktop/images";
private static String testFilespath ="hdfs://localhost:54310/user/root/images";
//private static String indexpath = "/home/student/Desktop/indexDemo";
private static  String testExtensive="/home/student/Desktop/images";

public static class MapClass extends MapReduceBase
implements Mapper<Text, Text, Text, Text> {
private Text input_image = new Text();
private Text input_vector = new Text();
    @Override
public void map(Text key, Text value,OutputCollector<Text, Text> output,Reporter       reporter) throws IOException {

 System.out.println("CorrelogramIndex Method:");  
       String featureString;
int MAXIMUM_DISTANCE = 16;
AutoColorCorrelogram.Mode mode = AutoColorCorrelogram.Mode.FullNeighbourhood;
for (String identifier : testFiles) {
            try (FileInputStream fis = new FileInputStream(testFilespath + "/" +    identifier)) {
  //Document doc = builder.createDocument(fis, identifier);
//FileInputStream imageStream = new FileInputStream(testFilespath + "/" + identifier);
BufferedImage bimg = ImageIO.read(fis);
 AutoColorCorrelogram vd = new AutoColorCorrelogram(MAXIMUM_DISTANCE, mode);
                 vd.extract(bimg);
               featureString = vd.getStringRepresentation();
               double[] bytearray=vd.getDoubleHistogram();
              System.out.println("image: "+ identifier + " " + featureString );

        }
             System.out.println(" ------------- ");
input_image.set(identifier);
input_vector.set(featureString);
   output.collect(input_image, input_vector);
              }

     }
   }

  public static class Reduce extends MapReduceBase
  implements Reducer<Text, Text, Text, Text> {

    @Override
public void reduce(Text key, Iterator<Text> values,
                   OutputCollector<Text, Text> output, 
                   Reporter reporter) throws IOException {
  String out_vector="";

  while (values.hasNext()) {
   out_vector.concat(values.next().toString());
 }
  output.collect(key, new Text(out_vector));
  }
}

static int printUsage() {
System.out.println("image_mapreduce [-m <maps>] [-r <reduces>] <input> <output>");
ToolRunner.printGenericCommandUsage(System.out);
return -1;
}


@Override
  public int run(String[] args) throws Exception {
JobConf conf = new JobConf(getConf(), image_mapreduce.class);
conf.setJobName("image_mapreduce");

// the keys are words (strings)
conf.setOutputKeyClass(Text.class);
// the values are counts (ints)
conf.setOutputValueClass(Text.class);

conf.setMapperClass(MapClass.class);        
//  conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);

List<String> other_args = new ArrayList<String>();
for(int i=0; i < args.length; ++i) {
  try {
    if ("-m".equals(args[i])) {
      conf.setNumMapTasks(Integer.parseInt(args[++i]));
    } else if ("-r".equals(args[i])) {
      conf.setNumReduceTasks(Integer.parseInt(args[++i]));
    } else {
      other_args.add(args[i]);
    }
  } catch (NumberFormatException except) {
    System.out.println("ERROR: Integer expected instead of " + args[i]);
    return printUsage();
  } catch (ArrayIndexOutOfBoundsException except) {
    System.out.println("ERROR: Required parameter missing from " +
                       args[i-1]);
    return printUsage();
  }
}



   FileInputFormat.setInputPaths(conf, other_args.get(0));
    //FileInputFormat.setInputPaths(conf,new    Path("hdfs://localhost:54310/user/root/images"));
FileOutputFormat.setOutputPath(conf, new Path(other_args.get(1)));

JobClient.runJob(conf);
return 0;
}


 public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new image_mapreduce(), args);
System.exit(res);
 }

}

`I am writing a program which takes multiple image files as input , stored in hdfs & extract the features in map function. How can I specify the path to read the image in FileInputStream(some parameters)? Or is there any way to read the multiple image files?

What I want to do is: --Take multiple image files in hdfs as input -- extract features in map function. --reduce itearatively. Please help me in the code or better ways to do it.

解决方案

Look into using the HIPI library - it stores a collection of images into an ImageBundle (which is more efficient that storing the individual image files in HDFS). They have a couple of examples too.

As for your code, you need to specify what input and output formats you plan to use. There is no current input format that hands the entire file over, but you can just extend FileInputFormat and create a RecordReader that emits <Text, BytesWritable> pairs, where the key is the filename, and the value is the bytes of the image file.

In fact Hadoop - The Definitive Guide has an example of this exact input format:

这篇关于如何在map-reduce中读取多个图像文件作为hdfs的输入？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在map-reduce中读取多个图像文件作为hdfs的输入？ [英] How to read multiple image files as input from hdfs in map-reduce?

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

如何在map-reduce中读取多个图像文件作为hdfs的输入？ [英] How to read multiple image files as input from hdfs in map-reduce?

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭