Hadoop中的CSV处理 [英] CSV processing in Hadoop

查看:677
本文介绍了Hadoop中的CSV处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 csv 档案中有6个栏位:





>

我在java中写 mapreduce ,用逗号分割所有字段,并在键中发送学生姓名,并在地图的值中标记。



reduce 我正在处理他们输出学生姓名的密钥和爱尔兰加上总,平均等



我认为可能有另一种更有效的方法。

有没有任何内置函数 hadoop 可以按学生姓名分组,并可以计算总分和与该学生相关的平均分数。

解决方案

您可能想查看Pig http://pig.apache.org/它提供了一个简单的语言在Hadoop的顶部,让你执行许多标准任务更短的代码。


I have 6 fields in a csv file:

  • first is student name (String)
  • others are student's marks like subject 1 , subject 2 etc

I am writing mapreduce in java, splitting all fields with comma and sending student name in key and marks in value of map.

In reduce I'm processing them outputting student name in key and theire marks plus total, average, etc in value of reduce.

I think there may be an alternative, and more efficient way to do this.

Has anyone got an idea of a better way to do this these operations?

Are there any inbuilt functions of hadoop which can group by student name and can calculate total marks and average associated to thaty student?

解决方案

You might want to have a look at Pig http://pig.apache.org/ which provides a simple language on top of Hadoop that lets you perform many standard tasks with much shorter code.

这篇关于Hadoop中的CSV处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆