将 csv 数据加载到 Hbase [英] Loading csv data into Hbase

查看:62
本文介绍了将 csv 数据加载到 Hbase的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对 hadoop 和 hbase 非常陌生,并且在我找到的每个教程中都有一些概念性问题让我感到困惑.

I am very new to hadoop and hbase and have some conceptual questions that are tripping me up during every tutorial I've found.

我在 win 7 系统上的 ubuntu VM 内的单个节点上运行了 hadoop 和 hbase.我有一个 csv 文件,我想将它加载到单个 hbase 表中.

I have hadoop and hbase running on a single node within a ubuntu VM on my win 7 system. I have a csv file that I would like to load into a single hbase table.

列是:loan_number、borrower_name、current_distribution_date、loan_amount

The columns are: loan_number, borrower_name, current_distribution_date, loan_amount

我知道我需要编写一个 MapReduce 作业来将这个 csv 文件加载到 hbase 中.以下教程描述了编写此 MapReduce 作业所需的 Java.http://salsahpc.indiana.edu/ScienceCloud/hbase_hands_on_1.htm

I know that I need to write a MapReduce job to load this said csv file into hbase. The following tutorial describes the Java needed to write this MapReduce job. http://salsahpc.indiana.edu/ScienceCloud/hbase_hands_on_1.htm

我缺少的是:

我在哪里保存这些文件以及在哪里编译它们?我应该在运行 Visual Studio 12 的 win 7 机器上编译它,然后将它移动到 ubuntu vm 吗?

Where do I save these files and where do I compile them? Should I compile this on my win 7 machine running visual studio 12 and then move it to the ubuntu vm?

我阅读了这个问题和答案,但我想我仍然缺少基础知识:使用 MapReduce 将 CSV 文件加载到 Hbase 表中

I read this SO question and answers but I guess I'm still missing the basics: Loading CSV File into Hbase table using MapReduce

我找不到任何涵盖这些基本 hadoop/hbase 物流的内容.任何帮助将不胜感激.

I can't find anything covering these basic hadoop/hbase logistics. Any help would be greatly appreciated.

推荐答案

无需编写 MapReduce 作业来将数据批量加载到 HBase.有几种方法可以将数据批量加载到 HBase 中:

There is no need to code a MapReduce job to bulk load data into HBase. There are several ways to bulk load data into HBase:

1) 使用 HBase 工具,例如 importtsvcompletebulkload http://hbase.apache.org/book/arch.bulk.load.html

1) Use HBase tools like importtsv and completebulkload http://hbase.apache.org/book/arch.bulk.load.html

2) 使用 Pig 批量加载数据.示例:

2) Use Pig to bulk load data. Example:

A = LOAD '/hbasetest.txt' USING PigStorage(',') as 
      (strdata:chararray, intdata:long);
STORE A INTO 'hbase://mydata'
        USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
              'mycf:intdata');

3) 使用 HBase API 以编程方式执行此操作.我有一个名为 hbaseloader 的小项目,用于将文件加载到 HBase 表中(该表只有一个 ColumnFamilyem> 与文件的内容).看看吧,你只需要定义你的表的结构并修改代码来读取一个csv文件并解析它.

3) Do it programatically using the HBase API. I got a small project called hbaseloader that loads files into a HBase table (table it has just one ColumnFamily with the content of the file). Take a look at it, you just need to define the structure of your table and modified the code to read a csv file and parse it.

4) 使用您提到的示例中的 MapReduce 作业以编程方式执行此操作.

4) Do it programatically using a MapReduce job like in the example you mentioned.

这篇关于将 csv 数据加载到 Hbase的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆