如何使用Hadoop的InputFormats在Apache的火花？ [英] How to use Hadoop InputFormats In Apache Spark?

查看：219 发布时间：2016/5/22 15:58:45 hadoop hdfs apache-spark

本文介绍了如何使用Hadoop的InputFormats在Apache的火花？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在Hadoop的一个类 ImageInputFormat 从HDFS读取图像。如何使用我的InputFormat火花？

下面是我的 ImageInputFormat ：

 公共类ImageInputFormat扩展FileInputFormat＆LT;文字，ImageWritable＆GT; {    @覆盖
    公共ImageRecordReader createRecordReader（InputSplit裂开，
                  TaskAttemptContext上下文）抛出IOException异常，InterruptedException的{
        返回新ImageRecordReader（）;
    }    @覆盖
    保护布尔isSplitable（JobContext背景下，路径文件名）{
        返回false;
    }
}

解决方案

的<一个href=\"http://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.SparkContext\">SparkContext有一个名为 hadoopFile 方法。它接受实现接口的类 org.apache.hadoop.ma pred.InputFormat

其描述说：获取RDD与任意的InputFormat Hadoop的文件。

也有看<一个href=\"https://spark.incubator.apache.org/docs/latest/scala-programming-guide.html#hadoop-datasets\">Spark文档。

I have a class ImageInputFormat in Hadoop which reads images from HDFS. How to use my InputFormat in Spark?

Here is my ImageInputFormat:

public class ImageInputFormat extends FileInputFormat<Text, ImageWritable> {

    @Override
    public ImageRecordReader createRecordReader(InputSplit split, 
                  TaskAttemptContext context) throws IOException, InterruptedException {
        return new ImageRecordReader();
    }

    @Override
    protected boolean isSplitable(JobContext context, Path filename) {
        return false;
    }
}

解决方案

The SparkContext has a method called hadoopFile. It accepts classes implementing the interface org.apache.hadoop.mapred.InputFormat

Its description says "Get an RDD for a Hadoop file with an arbitrary InputFormat".

Also have a look at the Spark Documentation.

这篇关于如何使用Hadoop的InputFormats在Apache的火花？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用Hadoop的InputFormats在Apache的火花？ [英] How to use Hadoop InputFormats In Apache Spark?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何使用Hadoop的InputFormats在Apache的火花？ [英] How to use Hadoop InputFormats In Apache Spark?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭