通过使用Spark with JAVA从HBase读取数据 [英] read data from HBase by using Spark with JAVA

查看：632 发布时间：2018/5/31 20:11:35 java hadoop apache-spark hbase

本文介绍了通过使用Spark with JAVA从HBase读取数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想通过使用JAVA的Spark访问HBase。除了此之外，我还没有发现任何示例。在答案中写道，

你也可以用Java编写它

  import org.apache.hadoop.hbase.client。{HBaseAdmin，Result} 
 import org.apache.hadoop.hbase。{HBaseConfiguration，HTableDescriptor} 
 import org.apache.hadoop.hbase.mapreduce.TableInputFormat 
 import org.apache.hadoop.hbase.io.ImmutableBytesWritable 
 
 import org.apache.spark._ 
 
 object HBaseRead {
 def main（args：Array [String]）{
 val sparkConf = new SparkConf（） .setAppName（HBaseRead）。setMaster（local [2]）
 val sc = new SparkContext（sparkConf）
 val conf = HBaseConfiguration.create（）
 val tableName =table1 
 
 System.setProperty（user.name，hdfs）
 System.setProperty（HADOOP_USER_NAME， hdfs）
 conf.set（hbase.master，localhost：60000）
 conf.setInt（timeout，120000）
 conf.set（hbase.zookeeper .quorum，localhost）
 conf.set（zookeeper.znode.parent，/ hbase-unsecure）
 conf.set（TableInputFormat.INPUT_TABLE，tableName）
 
 val admin = new HBaseAdmin（conf）
 if（！admin.isTableAvailable（tableName））{
 val tableDesc = new HTableDescriptor（tableName）
 admin.createTable（tableDesc）
 
 
 $ val hBaseRDD = sc.newAPIHadoopRDD（conf，classOf [TableInputFormat]，classOf [ImmutableBytesWritable]，classOf [Result]）
 println（找到的记录数量：+ hBaseRDD .count（））
 sc.stop（）
} 
}

任何人都可以给我一些提示如何找到正确的依赖关系，对象和东西？

HBaseConfiguration 好像是 hbase-client ，但我实际上坚持 TableInputFormat.INPUT_TABLE 。难道这不在相同的依赖吗？

有没有更好的方法来使用spark来访问hbase？
解决方案

是的。有。使用 SparkOnHbase 来自Cloudera。

 <依赖关系> 
< groupId> org.apache.hbase< / groupId> 
< artifactId> hbase-spark< / artifactId> 
< version> 1.2.0-cdh5.7.0< / version> 
< /依赖关系>

然后使用HBase扫描从您的HBase表中读取数据（如果您知道密钥你可以检索这些行）。

  Configuration conf = HBaseConfiguration.create（）; 
 conf.addResource（新路径（/ etc / hbase / conf / core-site.xml））; 
 conf.addResource（新路径（/ etc / hbase / conf / hbase-site.xml））; 
 JavaHBaseContext hbaseContext = new JavaHBaseContext（jsc，conf）; 
 
扫描扫描=新扫描（）; 
 scan.setCaching（100）; 
 
 JavaRDD< Tuple2< byte []，List< Tuple3< byte []，byte []，byte []>>>> hbaseRdd = hbaseContext.hbaseRDD（tableName，scan）; 
 
 System.out.println（找到的记录数：+ hBaseRDD.count（））

I want to access HBase via Spark using JAVA. I have not found any examples for this besides this one. In the answer is written,

You can also write this in Java

I copied this code from How to read from hbase using spark :

import org.apache.hadoop.hbase.client.{HBaseAdmin, Result}
import org.apache.hadoop.hbase.{ HBaseConfiguration, HTableDescriptor }
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.io.ImmutableBytesWritable

import org.apache.spark._

object HBaseRead {
  def main(args: Array[String]) {
    val sparkConf = new SparkConf().setAppName("HBaseRead").setMaster("local[2]")
    val sc = new SparkContext(sparkConf)
    val conf = HBaseConfiguration.create()
    val tableName = "table1"

    System.setProperty("user.name", "hdfs")
    System.setProperty("HADOOP_USER_NAME", "hdfs")
    conf.set("hbase.master", "localhost:60000")
    conf.setInt("timeout", 120000)
    conf.set("hbase.zookeeper.quorum", "localhost")
    conf.set("zookeeper.znode.parent", "/hbase-unsecure")
    conf.set(TableInputFormat.INPUT_TABLE, tableName)

    val admin = new HBaseAdmin(conf)
    if (!admin.isTableAvailable(tableName)) {
      val tableDesc = new HTableDescriptor(tableName)
      admin.createTable(tableDesc)
    }

    val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat], classOf[ImmutableBytesWritable], classOf[Result])
    println("Number of Records found : " + hBaseRDD.count())
    sc.stop()
  }
}

Can anyone give me some hints how to find the correct dependencies, objects and stuff?

It seems like HBaseConfiguration is in hbase-client, but I actually stuck on TableInputFormat.INPUT_TABLE. Shouldn´t this be in the same dependency?

Is there a better way to access hbase with spark?

解决方案

Yes. There is. Use SparkOnHbase from Cloudera.

<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-spark</artifactId>
    <version>1.2.0-cdh5.7.0</version>
</dependency>

And the use HBase scan to read data from you HBase table (or Bulk Get if you know the keys of the rows you want to retrieve).

Configuration conf = HBaseConfiguration.create();
conf.addResource(new Path("/etc/hbase/conf/core-site.xml"));
conf.addResource(new Path("/etc/hbase/conf/hbase-site.xml"));
JavaHBaseContext hbaseContext = new JavaHBaseContext(jsc, conf);

Scan scan = new Scan();
scan.setCaching(100);

JavaRDD<Tuple2<byte[], List<Tuple3<byte[], byte[], byte[]>>>> hbaseRdd = hbaseContext.hbaseRDD(tableName, scan);

System.out.println("Number of Records found : " + hBaseRDD.count())

这篇关于通过使用Spark with JAVA从HBase读取数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

通过使用Spark with JAVA从HBase读取数据 [英] read data from HBase by using Spark with JAVA

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

通过使用Spark with JAVA从HBase读取数据 [英] read data from HBase by using Spark with JAVA

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭