AWS EMR上的avro错误 [英] avro error on AWS EMR

查看:395
本文介绍了AWS EMR上的avro错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用spark-redshift( https://github.com/databricks/spark-redshift <



从Redshift中读取是可以的,在写入时我得到

 导致:java.lang.NoSuchMethodError:org.apache.avro.generic.GenericData.createDatumWriter(Lorg / apache / avro / Schema;)Lorg / apache / avro / io / DatumWriter 

尝试使用Amazon EMR 4.1.0(Spark 1.5.0)和4.0.0(Spark 1.4。 1)。
不能做
$ b $ $ p $ import org.apache.avro.generic.GenericData.createDatumWriter



$ p
$ b $ p $ import org .apache.avro.generic.GenericData

我正在使用scala shell
尝试下载几个其他avro-mapred和avro jar,尝试设置

  {classification:mapred-site,properties:{ mapreduce.job.user.classpath.first: 真}},{ 分类: 火花-ENV, 属性:{ spark.executor.userClassPathFirst: 真,spark.driver .userClassPathFirst:true}} 

并添加这些jar来激发classpath。可能需要以某种方式调整Hadoop(EMR)。



这是否对任何人都有好处?

解决方案

仅供参考 - 由Alex Nastetsky解决方法



从主节点删除jar

  find / -name* avro * jar2> / dev / null -print0 | xargs -0 -I文件sudo rm文件

从子节点删除jar

  yarn node -list | sed's /。* // g'| tail -n +3 | sed's /:.*// g'| xargs -I node ssh nodefind / -name* avro * jar2> / dev / null -print0 | xargs -0 -I file sudo rm file 

乔纳森提出的正确配置配置也值得一试。


I'm using spark-redshift (https://github.com/databricks/spark-redshift) which uses avro for transfer.

Reading from Redshift is OK, while writing I'm getting

Caused by: java.lang.NoSuchMethodError: org.apache.avro.generic.GenericData.createDatumWriter(Lorg/apache/avro/Schema;)Lorg/apache/avro/io/DatumWriter

tried using Amazon EMR 4.1.0 (Spark 1.5.0) and 4.0.0 (Spark 1.4.1). Cannot do

import org.apache.avro.generic.GenericData.createDatumWriter

either, just

import org.apache.avro.generic.GenericData

I'm using scala shell Tried download several others avro-mapred and avro jars, tried setting

{"classification":"mapred-site","properties":{"mapreduce.job.user.classpath.first":"true"}},{"classification":"spark-env","properties":{"spark.executor.userClassPathFirst":"true","spark.driver.userClassPathFirst":"true"}}

and adding those jars to spark classpath. Possibly need to tune Hadoop (EMR) somehow.

Does this ring a bell to anyone?

解决方案

just for reference - workaround by Alex Nastetsky

delete jars from master node

find / -name "*avro*jar" 2> /dev/null -print0 | xargs -0 -I file sudo rm file

delete jars from slave nodes

yarn node -list | sed 's/ .*//g' | tail -n +3 | sed 's/:.*//g' | xargs -I node ssh node "find / -name "*avro*jar" 2> /dev/null -print0 | xargs -0 -I file sudo rm file

Setting configs correctly as proposed by Jonathan is worth a shot too.

这篇关于AWS EMR上的avro错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆