将 csv 数据加载到 Hbase [英] Loading csv data into Hbase
问题描述
我对 hadoop 和 hbase 非常陌生,并且在我找到的每个教程中都有一些概念性问题让我感到困惑.
I am very new to hadoop and hbase and have some conceptual questions that are tripping me up during every tutorial I've found.
我在 win 7 系统上的 ubuntu VM 内的单个节点上运行了 hadoop 和 hbase.我有一个 csv 文件,我想将它加载到单个 hbase 表中.
I have hadoop and hbase running on a single node within a ubuntu VM on my win 7 system. I have a csv file that I would like to load into a single hbase table.
列是:loan_number、borrower_name、current_distribution_date、loan_amount
The columns are: loan_number, borrower_name, current_distribution_date, loan_amount
我知道我需要编写一个 MapReduce 作业来将这个 csv 文件加载到 hbase 中.以下教程描述了编写此 MapReduce 作业所需的 Java.http://salsahpc.indiana.edu/ScienceCloud/hbase_hands_on_1.htm
I know that I need to write a MapReduce job to load this said csv file into hbase. The following tutorial describes the Java needed to write this MapReduce job. http://salsahpc.indiana.edu/ScienceCloud/hbase_hands_on_1.htm
我缺少的是:
我在哪里保存这些文件以及在哪里编译它们?我应该在运行 Visual Studio 12 的 win 7 机器上编译它,然后将它移动到 ubuntu vm 吗?
Where do I save these files and where do I compile them? Should I compile this on my win 7 machine running visual studio 12 and then move it to the ubuntu vm?
我阅读了这个问题和答案,但我想我仍然缺少基础知识:使用 MapReduce 将 CSV 文件加载到 Hbase 表中
I read this SO question and answers but I guess I'm still missing the basics: Loading CSV File into Hbase table using MapReduce
我找不到任何涵盖这些基本 hadoop/hbase 物流的内容.任何帮助将不胜感激.
I can't find anything covering these basic hadoop/hbase logistics. Any help would be greatly appreciated.
推荐答案
无需编写 MapReduce 作业来将数据批量加载到 HBase.有几种方法可以将数据批量加载到 HBase 中:
There is no need to code a MapReduce job to bulk load data into HBase. There are several ways to bulk load data into HBase:
1) 使用 HBase 工具,例如 importtsv
和 completebulkload
http://hbase.apache.org/book/arch.bulk.load.html
1) Use HBase tools like importtsv
and completebulkload
http://hbase.apache.org/book/arch.bulk.load.html
2) 使用 Pig 批量加载数据.示例:
2) Use Pig to bulk load data. Example:
A = LOAD '/hbasetest.txt' USING PigStorage(',') as
(strdata:chararray, intdata:long);
STORE A INTO 'hbase://mydata'
USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
'mycf:intdata');
3) 使用 HBase API 以编程方式执行此操作.我有一个名为 hbaseloader 的小项目,用于将文件加载到 HBase 表中(该表只有一个 ColumnFamilyem> 与文件的内容).看看吧,你只需要定义你的表的结构并修改代码来读取一个csv文件并解析它.
3) Do it programatically using the HBase API. I got a small project called hbaseloader that loads files into a HBase table (table it has just one ColumnFamily with the content of the file). Take a look at it, you just need to define the structure of your table and modified the code to read a csv file and parse it.
4) 使用您提到的示例中的 MapReduce 作业以编程方式执行此操作.
4) Do it programatically using a MapReduce job like in the example you mentioned.
这篇关于将 csv 数据加载到 Hbase的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!