将csv数据加载到Hbase中 [英] Loading csv data into Hbase

查看：555 发布时间：2018/5/31 18:33:47 hadoop hbase

本文介绍了将csv数据加载到Hbase中的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我对hadoop和hbase非常陌生，并且在我找到的每个教程中都有一些概念性问题让我失望。

我有hadoop和hbase运行在我的win 7系统上的一个Ubuntu VM内的单个节点上。我有一个csv文件，我想加载到一个hbase表中。

列是：loan_number，borrower_name，current_distribution_date，loan_amount

我知道我需要编写一个MapReduce作业来将这个csv文件加载到hbase中。以下教程描述编写此MapReduce作业所需的Java。
http://salsahpc.indiana.edu/ScienceCloud/hbase_hands_on_1.htm

我缺少的是：

我在哪里保存这些文件，我在哪里编译它们？我应该在运行visual studio 12的win 7机器上编译这个文件，然后将它移动到ubuntu vm吗？

我读过这个问题和答案，但我想我是仍缺少基础知识：使用MapReduce将CSV文件加载到Hbase表中

我无法找到涵盖这些基本hadoop / hbase物流的任何内容。任何帮助将不胜感激。

解决方案

不需要编写MapReduce作业将数据批量加载到HBase中。有几种方法可以将数据批量加载到HBase中：

<1>使用HBase工具，如 importtsv 和 completebulkload http://hbase.apache.org/book/arch .bulk.load.html

2）使用Pig批量加载数据。例如：

$ p $ A = LOAD'/hbasetest.txt'使用PigStorage（'，'）作为（strdata： chararray，intdata：long）; STORE A INTO'hbase：// mydata' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage（ 'mycf：intdata'）;

<3>以编程方式使用HBase API。我有一个名为 hbaseloader 的小型项目，它将文件加载到HBase表（它只有一个 ColumnFamily <与文件的内容）。看看它，你只需要定义表的结构并修改代码来读取一个csv文件并解析它。

4）以编程方式使用它一个MapReduce作业，就像你提到的例子。

I am very new to hadoop and hbase and have some conceptual questions that are tripping me up during every tutorial I've found.

I have hadoop and hbase running on a single node within a ubuntu VM on my win 7 system. I have a csv file that I would like to load into a single hbase table.

The columns are: loan_number, borrower_name, current_distribution_date, loan_amount

I know that I need to write a MapReduce job to load this said csv file into hbase. The following tutorial describes the Java needed to write this MapReduce job. http://salsahpc.indiana.edu/ScienceCloud/hbase_hands_on_1.htm

What I'm missing is:

Where do I save these files and where do I compile them? Should I compile this on my win 7 machine running visual studio 12 and then move it to the ubuntu vm?

I read this SO question and answers but I guess I'm still missing the basics: Loading CSV File into Hbase table using MapReduce

I can't find anything covering these basic hadoop/hbase logistics. Any help would be greatly appreciated.
解决方案
There is no need to code a MapReduce job to bulk load data into HBase. There are several ways to bulk load data into HBase:

1) Use HBase tools like importtsv and completebulkload http://hbase.apache.org/book/arch.bulk.load.html

2) Use Pig to bulk load data. Example:
A = LOAD '/hbasetest.txt' USING PigStorage(',') as (strdata:chararray, intdata:long); STORE A INTO 'hbase://mydata' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage( 'mycf:intdata');
3) Do it programatically using the HBase API. I got a small project called hbaseloader that loads files into a HBase table (table it has just one ColumnFamily with the content of the file). Take a look at it, you just need to define the structure of your table and modified the code to read a csv file and parse it.

4) Do it programatically using a MapReduce job like in the example you mentioned.

这篇关于将csv数据加载到Hbase中的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将csv数据加载到Hbase中 [英] Loading csv data into Hbase

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

将csv数据加载到Hbase中 [英] Loading csv data into Hbase

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭