将数据从HDFS导入HBase(cdh3u2) [英] Import data from HDFS to HBase (cdh3u2)

查看:105
本文介绍了将数据从HDFS导入HBase(cdh3u2)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经安装了hadoop和hbase cdh3u2。在hadoop中,我在 /home/file.txt 路径中有一个文件。它有如下数据:

  one,1 
two,2
three,3

我想将这个文件导入到hbase中。在那里,第一个字段应该被解析为String,第二个字段被解析为整数,然后它应该被推入到hbase中。帮助我做到这一点



a感谢dvance ....

解决方案

我喜欢使用Apache Pig来接收HBase,因为它非常简单,直接且灵活。



这是一个Pig脚本,可以帮你完成工作你已经创建了表格和列族。要创建表和列系列,您需要:

  $ hbase shell 
>创建'mydata','mycf'

将文件移至HDFS:

  $ hadoop fs -put /home/file.txt /user/surendhar/file.txt 

然后,编写猪脚本以存储 HBaseStorage (您可能需要查看

  A = LOAD'file.txt'使用PigStorage(',')as(strdata:chararray,intdata:long); 
STORE A INTO'hbase:// mydata'
USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
'mycf:intdata');

请注意,在上面的脚本中,键将是 strdata 。如果您想从某个东西创建自己的密钥,请使用 FOREACH 语句来生成密钥。 HBaseStorage假定前一个关系(本例中 A :: strdata )中的第一件事是关键。






其他选项包括:


  • 编写 Java MapReduce 作业完成与上述相同的工作。
  • 直接与HTable进行交互与客户端并逐行放入。这应该只能用更小的文件来完成。
  • 使用某种脚本(例如,sed,perl,python)将数据向上推到hbase shell中,以转换行的csv放入shell 放入命令。再次,这应该只是如果记录的数量很小。

      $ cat /home/file.txt | transform.pl 
    放'mydata','one','mycf:intdata','1'
    放'mydata','two','mycf:intdata','2'
    把'mydata','three','mycf:intdata','3'

    $ cat /home/file.txt | transform.pl | hbase shell



I have Installed hadoop and hbase cdh3u2. In hadoop i have a file at the path /home/file.txt. it has the data like

one,1
two,2
three,3

I want to import this file into hbase. in that, the first field should parsed as String, and 2nd field parsed as integer, and then it should pushed into hbase. Help me to do this

aThanks in dvance....

解决方案

I like using Apache Pig for ingest into HBase because it is simple, straightforward, and flexible.

Here is a Pig script that would do the job for you, after you have created the table and the column family. To create the table and the column family, you'll do:

$ hbase shell
> create 'mydata', 'mycf'

Move the file to HDFS:

$ hadoop fs -put /home/file.txt /user/surendhar/file.txt

Then, write the pig script to store with HBaseStorage (you may have to look up how to set up and run Pig):

A = LOAD 'file.txt' USING PigStorage(',') as (strdata:chararray, intdata:long);
STORE A INTO 'hbase://mydata'
        USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
              'mycf:intdata');

Note that in the above script, the key is going to be strdata. If you want to create your own key from something, use a FOREACH statement to generate the key. HBaseStorage assumes that the first thing in the previous relation (A::strdata in this case) is the key.


Some other options would be:

  • Write a Java MapReduce job to do the same thing as above.
  • Interact directly with the HTable with the client and put in row-by-row. This should only be done with much smaller files.
  • Push the data up with the hbase shell using some sort of script (i.e., sed, perl, python) that transforms the lines of csv into shell put commands. Again, this should only be done if the number of records is small.

    $ cat /home/file.txt | transform.pl
    put 'mydata', 'one', 'mycf:intdata', '1'
    put 'mydata', 'two', 'mycf:intdata', '2'
    put 'mydata', 'three', 'mycf:intdata', '3'
    
    $ cat /home/file.txt | transform.pl | hbase shell
    

这篇关于将数据从HDFS导入HBase(cdh3u2)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆