序列文件在Hadoop中 [英] Sequence Files in Hadoop
本文介绍了序列文件在Hadoop中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
如何生成这些序列文件?我看到一个关于序列文件链接,在这里,
How are these sequence files generated ? I saw a link about sequence file here,
http://wiki.apache.org/hadoop/SequenceFile
这些是使用默认的Java串行写的?和我如何读取序列文件?
Are these written using default Java serializer ? and How do I read a sequence file ?
推荐答案
序列文件由马preduce任务生成,并可以作为马preduce就业之间传输数据的通用格式。
Sequence files are generated by MapReduce tasks and and can be used as common format to transfer data between MapReduce jobs.
您可以通过以下方式阅读:
You can read them in the following manner:
Configuration config = new Configuration();
Path path = new Path(PATH_TO_YOUR_FILE);
SequenceFile.Reader reader = new SequenceFile.Reader(FileSystem.get(config), path, config);
WritableComparable key = (WritableComparable) reader.getKeyClass().newInstance();
Writable value = (Writable) reader.getValueClass().newInstance();
while (reader.next(key, value))
// perform some operating
reader.close();
您也可以使用SequenceFile.Writer自行生成序列文件。
Also you can generate sequence files by yourself using SequenceFile.Writer.
这篇关于序列文件在Hadoop中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文