我应该使用哪种适用于Spark 2.0的HBase连接器? [英] Which HBase connector for Spark 2.0 should I use?
问题描述
我们的堆栈由Google Data Proc(Spark 2.0)和Google BigTable(HBase 1.2.0)组成,我正在寻找一个可以与这些版本兼容的连接器。
对于我所找到的连接器,Spark 2.0和新的DataSet API支持并不清楚: 该项目使用SBT编写在Scala 2.11中。 感谢您的帮助 原始答案: b 我不相信这些(或任何其他现有的连接器)中的任何一个都可以做到您今天想要的一切。 我会推荐使用HBase MapReduce API以及newAPIHadoopRDD(或者spark-hbase-connector?)等RDD方法。然后手动将RDD转换为DataSet。这种方法在Scala或Java中比Python更容易。 这是HBase社区正在努力改进的一个领域,Google Cloud Dataproc将整合这些改进发生。 Our stack is composed of Google Data Proc (Spark 2.0) and Google BigTable (HBase 1.2.0) and I am looking for a connector working with these versions. The Spark 2.0 and the new DataSet API support is not clear to me for the connectors I have found: The project is written in Scala 2.11 with SBT. Thanks for your help Update: SHC now seems to work with Spark 2 and the Table API. See https://github.com/GoogleCloudPlatform/cloud-bigtable-examples/tree/master/scala/bigtable-shc Original answer: I don't believe any of these (or any other existing connector) will do all that you would like today. I would recommend just using HBase MapReduce APIs with RDD methods like newAPIHadoopRDD (or possibly the spark-hbase-connector?). Then manually convert RDDs into DataSets. This approach is a lot easier in Scala or Java than Python. This is an area that the HBase community is working to improve and Google Cloud Dataproc will incorporate those improvements as they happen. 这篇关于我应该使用哪种适用于Spark 2.0的HBase连接器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
$ b $在发布时(HBase 1.4?),b