读取hadoop map中的excel文件reduce [英] Reading a excel file in hadoop map reduce

查看:250
本文介绍了读取hadoop map中的excel文件reduce的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图读取包含一些数据的Excel文件,以便在hadoop中进行聚合。map reduce程序似乎工作正常,但输出的产品是不可读的格式。我是否需要为Excel使用任何特殊的InputFormat阅读器文件在Hadoop Map Reduce?.My配置如下

I am trying to read a Excel file containing some data for aggregation in hadoop.The map reduce program seems to be working fine but the output produce is in a non readable format.Do I need to use any special InputFormat reader for Excel file in Hadoop Map Reduce ?.My configuration is as below

   Configuration conf=getConf();
Job job=new Job(conf,"LatestWordCount");
job.setJarByClass(FlightDetailsCount.class);
Path input=new Path(args[0]);
Path output=new Path(args[1]);
FileInputFormat.setInputPaths(job, input);
FileOutputFormat.setOutputPath(job, output);
job.setMapperClass(MapClass.class);
job.setReducerClass(ReduceClass.class);
//job.setCombinerClass(ReduceClass.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
//job.setOutputKeyClass(Text.class);
//job.setOutputValueClass(Text.class);
System.exit(job.waitForCompletion(true)?0:1);
return 0;

输出结果如下所示
KW O A ] n E r3 \\\
p 6W jJ 9W f= 9ml dR y/Ք 7 ^ 我
M *Ք^ NZL ^dRͱ/ 7TS * M // 7TS&安培)妗Ĵ(; jZoTSR7 @ ) o TӺ 5{% + ۆ w6- = e _}m )〜 ʅ ژ :# j ] u >

The output produce looks like this �KW ��O�A��]n��Ε��r3�\n"���p�饚6W�jJ���9W�f=��9ml��dR�y/Ք��7�^�i ��M*Ք�^nz��l��^�)��妗j�(��dRͱ/7�TS*��M//7�TS��&�jZ��o��TSR�7�@�)�o��TӺ��5{%��+��ۆ�w6-��=�e�_}m�)~��ʅ��ژ���: #�j�]��u����>

推荐答案

我不知道是否有人为MS Excel文件开发了自定义的InputFormat我怀疑它和快速研究什么也没有),但你肯定无法使用TextInputFormat读取Excel文件.XSL文件是二进制文件。

I don't know if someone actually developed a custom InputFormat for MS Excel files (I doubt it and quick research turns up nothing), but you most certainly can not read an Excel file using the TextInputFormat. XSL files are binary.

解决方案:导出Excel文件转换为CSV或TSV格式,然后您就可以使用TextInputFormat加载它们。

Solution: Export your Excel file to CSV or TSV, then you'll be able to load them using the TextInputFormat.

这篇关于读取hadoop map中的excel文件reduce的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆