EMR 5.21,Spark 2.4-Json4s依赖关系已损坏 [英] EMR 5.21 , Spark 2.4 - Json4s Dependency broken

查看:81
本文介绍了EMR 5.21,Spark 2.4-Json4s依赖关系已损坏的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在EMR 5.21中,Spark-Hbase集成被破坏.
df.write.options().format().save()失败.
原因是Spark 2.4中的json4s-jackson版本3.5.3,EMR 5.21
它在EMR 5.11.2,Spark 2.2,son4s-jackson版本3.2.11中正常工作
问题是这是EMR,所以我无法用较低的json4s重建火花.
有什么解决方法吗?

In EMR 5.21 , Spark - Hbase integration is broken.
df.write.options().format().save() fails.
Reason is json4s-jackson version 3.5.3 in spark 2.4 , EMR 5.21
it works fine in EMR 5.11.2 , Spark 2.2 , son4s-jackson version 3.2.11
Problem is this is EMR so i cant rebuild spark with lower json4s .
is there any workaround ?

py4j.protocol.Py4JJavaError:调用o104.save时发生错误.:java.lang.NoSuchMethodError:org.json4s.jackson.JsonMethods $ .parse(Lorg/json4s/JsonInput; Z)Lorg/json4s/JsonAST $ JValue;

py4j.protocol.Py4JJavaError: An error occurred while calling o104.save. : java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods$.parse(Lorg/json4s/JsonInput;Z)Lorg/json4s/JsonAST$JValue;

spark-submit --master yarn \
--jars /usr/lib/hbase/  \
--packages com.hortonworks:shc-core:1.1.3-2.3-s_2.11 \
--repositories http://repo.hortonworks.com/content/groups/public/  \
pysparkhbase_V1.1.py s3://<bucket>/ <Namespace> <Table> <cf> <Key>

代码

import sys
from pyspark.sql.functions import concat
from pyspark import SparkContext
from pyspark.sql import SQLContext,SparkSession
spark = SparkSession.builder.master("yarn").appName("PysparkHbaseConnection").config("spark.some.config.option", "PyHbase").getOrCreate()
spark.sql("set spark.sql.parquet.compression.codec=uncompressed")
spark.conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
data_source_format = 'org.apache.spark.sql.execution.datasources.hbase'
df = spark.read.parquet(file)
df.createOrReplaceTempView("view")
.
cat = '{|"table":{"namespace":"' + namespace + '", "name":"' + name + '", "tableCoder":"' + tableCoder + '", "version":"' + version + '"}, \n|"rowkey":"' + rowkey + '", \n|"columns":{'
.
df.write.options(catalog=cat).format(data_source_format).save()

推荐答案

将json4s降级为3.2.10即可解决.但我认为这是SHC错误,需要对其进行升级.

downgrade json4s to 3.2.10 can resolve it. but I think it's SHC bug,need to upgrade it.

这篇关于EMR 5.21,Spark 2.4-Json4s依赖关系已损坏的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆