使用Scala将SparkRDD写入HBase表 [英] writing SparkRDD to a HBase table using Scala

查看：175 发布时间：2018/6/5 13:26:33 scala apache-spark hbase rdd

本文介绍了使用Scala将SparkRDD写入HBase表的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图用Scala写一个SparkRDD到HBase表（以前没用过）。整个代码如下：

I am trying to write a SparkRDD to HBase table using scala(haven't used before). The entire code is this :

import org.apache.hadoop.hbase.client.{HBaseAdmin, Result} import org.apache.hadoop.hbase.{HBaseConfiguration, HTableDescriptor} import org.apache.hadoop.hbase.mapreduce.TableInputFormat import org.apache.hadoop.hbase.io.ImmutableBytesWritable import scala.collection.JavaConverters._ import org.apache.hadoop.hbase.util.Bytes import org.apache.spark._ import org.apache.hadoop.mapred.JobConf import org.apache.spark.rdd.PairRDDFunctions import org.apache.spark.SparkContext._ import org.apache.hadoop.mapred.Partitioner; import org.apache.hadoop.hbase.mapred.TableOutputFormat import org.apache.hadoop.hbase.client._ object HBaseWrite { def main(args: Array[String]) { val sparkConf = new SparkConf().setAppName("HBaseWrite").setMaster("local").set("spark.driver.allowMultipleContexts","true").set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") val sc = new SparkContext(sparkConf) val conf = HBaseConfiguration.create() val outputTable = "tablename" System.setProperty("user.name", "hdfs") System.setProperty("HADOOP_USER_NAME", "hdfs") conf.set("hbase.master", "localhost:60000") conf.setInt("timeout", 120000) conf.set("hbase.zookeeper.quorum", "localhost") conf.set("zookeeper.znode.parent", "/hbase-unsecure") conf.setInt("hbase.client.scanner.caching", 10000) sparkConf.registerKryoClasses(Array(classOf[org.apache.hadoop.hbase.client.Result])) val jobConfig: JobConf = new JobConf(conf,this.getClass) jobConfig.setOutputFormat(classOf[TableOutputFormat]) jobConfig.set(TableOutputFormat.OUTPUT_TABLE,outputTable) val x = 12 val y = 15 val z = 25 var newarray = Array(x,y,z) val newrddtohbase = sc.parallelize(newarray) def convert(a:Int) : Tuple2[ImmutableBytesWritable,Put] = { val p = new Put(Bytes.toBytes(a)) p.add(Bytes.toBytes("columnfamily"), Bytes.toBytes("col_1"), Bytes.toBytes(a)) new Tuple2[ImmutableBytesWritable,Put](new ImmutableBytesWritable(a.toString.getBytes()), p); } new PairRDDFunctions(newrddtohbase.map(convert)).saveAsHadoopDataset(jobConfig) sc.stop() } }

执行HBaseWrite（main（Array（））后得到的错误是这样的：

The error I get after doing HBaseWrite(main(Array()) is this:

org.apache.spark.SparkException: Task not serializable

如何继续完成它？

推荐答案

例如，下面的方法将Int作为参数并返回Double

For example, the below method takes Int as argument and returns Double

var toDouble: (Int) => Double = a => { a.toDouble }

>并返回 2.0

与以下方法一样，您可以将方法转换为函数literal。

The same way you can convert your method to function literal as below.

val convert: (Int) => Tuple2[ImmutableBytesWritable,Put] = a => { val p = new Put(Bytes.toBytes(a)) p.add(Bytes.toBytes("columnfamily"), Bytes.toBytes("col_1"), Bytes.toBytes(a)) new Tuple2[ImmutableBytesWritable,Put](new ImmutableBytesWritable(a.toString.getBytes()), p); }

这篇关于使用Scala将SparkRDD写入HBase表的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用Scala将SparkRDD写入HBase表 [英] writing SparkRDD to a HBase table using Scala

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用Scala将SparkRDD写入HBase表 [英] writing SparkRDD to a HBase table using Scala

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭