Sparkase在HBase表上 [英] SparkSQL on HBase Tables
问题描述
我是spark的新手,请指导我如何连接hbase和spark。如何在hbase表上查询。
AFAIK有两种连接hbase表的方式:直接连接到hbase:直接连接到hbase:直接连接到hbase: hbase,并从 RDD
创建一个 DataFrame
,并在其上执行SQL。
我不打算重新发明轮子,请参阅如何阅读从hbase使用火花
作为@iMKanchwala在上面链接中的答案已经描述过了。只有一件事是将其转换为dataframe(使用 toDF
)并遵循sql方法。
$ b
- 注册表为使用hbase存储处理程序配置hive外部表,并且您可以在hivecontext的spark上使用hive。
例如:
CREATE TABLE users(
userid int,name string,电子邮件字符串,笔记字符串)
存储由
'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
与SERDEPROPERTIES(
hbase.columns.mapping=
small:name,small:email,large:notes);
如何做到这一点,请参阅示例
我宁愿采用方法1。
希望有助于......
Anybody is using SparkSQL on HBase tables directly, like SparkSQL on Hive tables. I am new to spark.Please guide me how to connect hbase and spark.How to query on hbase tables.
AFAIK there are 2 ways to connect to hbase tables
- Directly connect to Hbase :
Directly connect hbase and create a DataFrame
from RDD
and execute SQL on top of that.
Im not going to re-invent the wheel please see How to read from hbase using spark
as the answer from @iMKanchwala in the above link has already described it. only thing is convert that in to dataframe (using toDF
) and follow the sql approach.
- Register table as hive external table with hbase storage handler and you can use hive on spark from hivecontext. It is also easy way.
Ex :
CREATE TABLE users(
userid int, name string, email string, notes string)
STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
"hbase.columns.mapping" =
"small:name,small:email,large:notes");
How to do that please see as an example
I would prefer approach 1.
Hope that helps...
这篇关于Sparkase在HBase表上的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!