AWS EMR上的avro错误 [英] avro error on AWS EMR

查看：395 发布时间：2018/5/31 19:01:04 java scala hadoop avro amazon-emr

本文介绍了AWS EMR上的avro错误的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用spark-redshift（ https://github.com/databricks/spark-redshift <

从Redshift中读取是可以的，在写入时我得到

 导致：java.lang.NoSuchMethodError：org.apache.avro.generic.GenericData.createDatumWriter（Lorg / apache / avro / Schema;）Lorg / apache / avro / io / DatumWriter

尝试使用Amazon EMR 4.1.0（Spark 1.5.0）和4.0.0（Spark 1.4。 1）。
不能做
$ b $ $ p $ import org.apache.avro.generic.GenericData.createDatumWriter $ p $ b $ p $ import org .apache.avro.generic.GenericData

我正在使用scala shell 尝试下载几个其他avro-mapred和avro jar，尝试设置

  {classification：mapred-site，properties：{ mapreduce.job.user.classpath.first： 真}}，{ 分类： 火花-ENV， 属性：{ spark.executor.userClassPathFirst： 真，spark.driver .userClassPathFirst：true}}

并添加这些jar来激发classpath。可能需要以某种方式调整Hadoop（EMR）。

 
 
 这是否对任何人都有好处？ 
 
解决方案
仅供参考 - 由Alex Nastetsky解决方法
 
 
 从主节点删除jar 
  find / -name* avro * jar2> / dev / null -print0 | xargs -0 -I文件sudo rm文件
  
从子节点删除jar 
  yarn node -list | sed's /。* // g'| tail -n +3 | sed's /:.*// g'| xargs -I node ssh nodefind / -name* avro * jar2> / dev / null -print0 | xargs -0 -I file sudo rm file 
  
乔纳森提出的正确配置配置也值得一试。

I'm using spark-redshift (https://github.com/databricks/spark-redshift) which uses avro for transfer.

Reading from Redshift is OK, while writing I'm getting 
Caused by: java.lang.NoSuchMethodError: org.apache.avro.generic.GenericData.createDatumWriter(Lorg/apache/avro/Schema;)Lorg/apache/avro/io/DatumWriter
tried using Amazon EMR 4.1.0 (Spark 1.5.0) and 4.0.0 (Spark 1.4.1).
Cannot do 
import org.apache.avro.generic.GenericData.createDatumWriter
either, just
import org.apache.avro.generic.GenericData
I'm using scala shell
Tried download several others avro-mapred and avro jars, tried setting 
{"classification":"mapred-site","properties":{"mapreduce.job.user.classpath.first":"true"}},{"classification":"spark-env","properties":{"spark.executor.userClassPathFirst":"true","spark.driver.userClassPathFirst":"true"}}
and adding those jars to spark classpath. Possibly need to tune Hadoop (EMR) somehow.

Does this ring a bell to anyone?
 解决方案 
just for reference - workaround by Alex Nastetsky 

delete jars from master node
find / -name "*avro*jar" 2> /dev/null -print0 | xargs -0 -I file sudo rm file
delete jars from slave nodes
yarn node -list | sed 's/ .*//g' | tail -n +3 | sed 's/:.*//g' | xargs -I node ssh node "find / -name "*avro*jar" 2> /dev/null -print0 | xargs -0 -I file sudo rm file
Setting configs correctly as proposed by Jonathan is worth a shot too.

                        这篇关于AWS EMR上的avro错误的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

AWS EMR上的avro错误 [英] avro error on AWS EMR

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

AWS EMR上的avro错误 [英] avro error on AWS EMR

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭