将csv数据加载到Hbase中 [英] Loading csv data into Hbase

查看:555
本文介绍了将csv数据加载到Hbase中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对hadoop和hbase非常陌生,并且在我找到的每个教程中都有一些概念性问题让我失望。



我有hadoop和hbase运行在我的win 7系统上的一个Ubuntu VM内的单个节点上。我有一个csv文件,我想加载到一个hbase表中。



列是:loan_number,borrower_name,current_distribution_date,loan_amount



我知道我需要编写一个MapReduce作业来将这个csv文件加载到hbase中。以下教程描述编写此MapReduce作业所需的Java。
http://salsahpc.indiana.edu/ScienceCloud/hbase_hands_on_1.htm



我缺少的是:

我在哪里保存这些文件,我在哪里编译它们?我应该在运行visual studio 12的win 7机器上编译这个文件,然后将它移动到ubuntu vm吗?



我读过这个问题和答案,但我想我是仍缺少基础知识:使用MapReduce将CSV文件加载到Hbase表中



我无法找到涵盖这些基本hadoop / hbase物流的任何内容。任何帮助将不胜感激。

解决方案

不需要编写MapReduce作业将数据批量加载到HBase中。有几种方法可以将数据批量加载到HBase中:

<1>使用HBase工具,如 importtsv completebulkload http://hbase.apache.org/book/arch .bulk.load.html



2)使用Pig批量加载数据。例如:

$ p $ A = LOAD'/hbasetest.txt'使用PigStorage(',')作为
(strdata: chararray,intdata:long);
STORE A INTO'hbase:// mydata'
USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
'mycf:intdata');



<3>以编程方式使用HBase API。我有一个名为 hbaseloader 的小型项目,它将文件加载到HBase表(它只有一个 ColumnFamily <与文件的内容)。看看它,你只需要定义表的结构并修改代码来读取一个csv文件并解析它。



4)以编程方式使用它一个MapReduce作业,就像你提到的例子。


I am very new to hadoop and hbase and have some conceptual questions that are tripping me up during every tutorial I've found.

I have hadoop and hbase running on a single node within a ubuntu VM on my win 7 system. I have a csv file that I would like to load into a single hbase table.

The columns are: loan_number, borrower_name, current_distribution_date, loan_amount

I know that I need to write a MapReduce job to load this said csv file into hbase. The following tutorial describes the Java needed to write this MapReduce job. http://salsahpc.indiana.edu/ScienceCloud/hbase_hands_on_1.htm

What I'm missing is:

Where do I save these files and where do I compile them? Should I compile this on my win 7 machine running visual studio 12 and then move it to the ubuntu vm?

I read this SO question and answers but I guess I'm still missing the basics: Loading CSV File into Hbase table using MapReduce

I can't find anything covering these basic hadoop/hbase logistics. Any help would be greatly appreciated.

解决方案

There is no need to code a MapReduce job to bulk load data into HBase. There are several ways to bulk load data into HBase:

1) Use HBase tools like importtsv and completebulkload http://hbase.apache.org/book/arch.bulk.load.html

2) Use Pig to bulk load data. Example:

A = LOAD '/hbasetest.txt' USING PigStorage(',') as 
      (strdata:chararray, intdata:long);
STORE A INTO 'hbase://mydata'
        USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
              'mycf:intdata');

3) Do it programatically using the HBase API. I got a small project called hbaseloader that loads files into a HBase table (table it has just one ColumnFamily with the content of the file). Take a look at it, you just need to define the structure of your table and modified the code to read a csv file and parse it.

4) Do it programatically using a MapReduce job like in the example you mentioned.

这篇关于将csv数据加载到Hbase中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆