Hadoop中的CSV处理 [英] CSV processing in Hadoop
问题描述
我在 csv
档案中有6个栏位:
我在java中写 mapreduce
,用逗号分割所有字段,并在键中发送学生姓名,并在地图的值中标记。
在 reduce
我正在处理他们输出学生姓名的密钥和爱尔兰加上总,平均等
我认为可能有另一种更有效的方法。有没有人知道更好的方法来做这些操作?
有没有任何内置函数 hadoop
可以按学生姓名分组,并可以计算总分和与该学生相关的平均分数。
您可能想查看Pig http://pig.apache.org/它提供了一个简单的语言在Hadoop的顶部,让你执行许多标准任务更短的代码。
I have 6 fields in a csv
file:
- first is student name (
String
) - others are student's marks like subject 1 , subject 2 etc
I am writing mapreduce
in java, splitting all fields with comma and sending student name in key and marks in value of map.
In reduce
I'm processing them outputting student name in key and theire marks plus total, average, etc in value of reduce
.
I think there may be an alternative, and more efficient way to do this.
Has anyone got an idea of a better way to do this these operations?
Are there any inbuilt functions of hadoop
which can group by student name and can calculate total marks and average associated to thaty student?
You might want to have a look at Pig http://pig.apache.org/ which provides a simple language on top of Hadoop that lets you perform many standard tasks with much shorter code.
这篇关于Hadoop中的CSV处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!