从HDFS读取一个简单的Avro文件 [英] Reading a simple Avro file from HDFS

查看:829
本文介绍了从HDFS读取一个简单的Avro文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对存储在HDFS中的Avro文件进行简单读取。我发现当它在本地文件系统上时如何读取....

I am trying to do a simple read of an Avro file stored in HDFS. I found out how to read it when it is on the local file system....

FileReader reader = DataFileReader.openReader(new File(filename), new GenericDatumReader());

for (GenericRecord datum : fileReader) {
   String value = datum.get(1).toString();
   System.out.println("value = " value);
}

reader.close();

但我的文件是HDFS格式。我不能给openReader一个Path或一个FSDataInputStream。我怎样才能简单地读取HDFS中的Avro文件?

My file is in HDFS, however. I cannot give the openReader a Path or an FSDataInputStream. How can I simply read an Avro file in HDFS?

编辑:我通过创建实现SeekableInput的自定义类(SeekableHadoopInput)来实现这一点。我从github上的Ganglion中偷了这个。不过,似乎会有一个Hadoop / Avro集成路径。

I got this to work by creating a custom class (SeekableHadoopInput) that implements SeekableInput. I "stole" this from "Ganglion" on github. Still, seems like there would be a Hadoop/Avro integration path for this.

谢谢

推荐答案

FsInput 类(在avro-mapred子模块中,因为它依赖于Hadoop)可以做到这一点。它提供了Avro数据文件所需的可搜索输入流。

The FsInput class (in the avro-mapred submodule, since it depends on Hadoop) can do this. It provides the seekable input stream that is needed for Avro data files.

Path path = new Path("/path/on/hdfs");
Configuration config = new Configuration(); // make this your Hadoop env config
SeekableInput input = new FsInput(path, config);
DatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>();
FileReader<GenericRecord> fileReader = DataFileReader.openReader(input, reader);

for (GenericRecord datum : fileReader) {
    System.out.println("value = " + datum);
}

fileReader.close(); // also closes underlying FsInput

这篇关于从HDFS读取一个简单的Avro文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆