将csv数据加载到Hbase中 [英] Loading csv data into Hbase
问题描述
我对hadoop和hbase非常陌生,并且在我找到的每个教程中都有一些概念性问题让我失望。
我有hadoop和hbase运行在我的win 7系统上的一个Ubuntu VM内的单个节点上。我有一个csv文件,我想加载到一个hbase表中。
列是:loan_number,borrower_name,current_distribution_date,loan_amount
我知道我需要编写一个MapReduce作业来将这个csv文件加载到hbase中。以下教程描述编写此MapReduce作业所需的Java。
http://salsahpc.indiana.edu/ScienceCloud/hbase_hands_on_1.htm
我缺少的是:
我在哪里保存这些文件,我在哪里编译它们?我应该在运行visual studio 12的win 7机器上编译这个文件,然后将它移动到ubuntu vm吗?
我读过这个问题和答案,但我想我是仍缺少基础知识:使用MapReduce将CSV文件加载到Hbase表中
我无法找到涵盖这些基本hadoop / hbase物流的任何内容。任何帮助将不胜感激。
不需要编写MapReduce作业将数据批量加载到HBase中。有几种方法可以将数据批量加载到HBase中:
<1>使用HBase工具,如 importtsv
和 completebulkload
http://hbase.apache.org/book/arch .bulk.load.html
2)使用Pig批量加载数据。例如:
$ p $ A = LOAD'/hbasetest.txt'使用PigStorage(',')作为
(strdata: chararray,intdata:long);
STORE A INTO'hbase:// mydata'
USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
'mycf:intdata');
<3>以编程方式使用HBase API。我有一个名为 hbaseloader 的小型项目,它将文件加载到HBase表(它只有一个 ColumnFamily <与文件的内容)。看看它,你只需要定义表的结构并修改代码来读取一个csv文件并解析它。
4)以编程方式使用它一个MapReduce作业,就像你提到的例子。
I am very new to hadoop and hbase and have some conceptual questions that are tripping me up during every tutorial I've found.
I have hadoop and hbase running on a single node within a ubuntu VM on my win 7 system. I have a csv file that I would like to load into a single hbase table.
The columns are: loan_number, borrower_name, current_distribution_date, loan_amount
I know that I need to write a MapReduce job to load this said csv file into hbase. The following tutorial describes the Java needed to write this MapReduce job. http://salsahpc.indiana.edu/ScienceCloud/hbase_hands_on_1.htm
What I'm missing is:
Where do I save these files and where do I compile them? Should I compile this on my win 7 machine running visual studio 12 and then move it to the ubuntu vm?
I read this SO question and answers but I guess I'm still missing the basics: Loading CSV File into Hbase table using MapReduce
I can't find anything covering these basic hadoop/hbase logistics. Any help would be greatly appreciated.
There is no need to code a MapReduce job to bulk load data into HBase. There are several ways to bulk load data into HBase:
1) Use HBase tools like importtsv
and completebulkload
http://hbase.apache.org/book/arch.bulk.load.html
2) Use Pig to bulk load data. Example:
A = LOAD '/hbasetest.txt' USING PigStorage(',') as
(strdata:chararray, intdata:long);
STORE A INTO 'hbase://mydata'
USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
'mycf:intdata');
3) Do it programatically using the HBase API. I got a small project called hbaseloader that loads files into a HBase table (table it has just one ColumnFamily with the content of the file). Take a look at it, you just need to define the structure of your table and modified the code to read a csv file and parse it.
4) Do it programatically using a MapReduce job like in the example you mentioned.
这篇关于将csv数据加载到Hbase中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!