访问Hive时,Dataproc Spark返回java.lang.UnsatisfiedLinkError:org.xerial.snappy.SnappyNative.uncompressedLength(Ljava/nio/ByteBuffer; II) [英] Dataproc Spark returns java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.uncompressedLength(Ljava/nio/ByteBuffer;II) when accessing Hive

查看:207
本文介绍了访问Hive时,Dataproc Spark返回java.lang.UnsatisfiedLinkError:org.xerial.snappy.SnappyNative.uncompressedLength(Ljava/nio/ByteBuffer; II)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从Dataproc 1.2迁移到1.3.当我使用图像版本1.3在Dataproc上创建新的Spark集群时.我

I'm moving from Dataproc 1.2 to 1.3. When I created a new Spark cluster on Dataproc using image version 1.3. I got

HiveMetaException: Metastore schema version is not compatible. Hive Version: 2.3.0, Database Schema Version: 2.1.0 

因为数据库架构不兼容.所以我将SSH切换到Dataproc主实例并运行

because of database schema incompatibility. So I ssh-ed to Dataproc master instance and ran

schematool -dbType mysql -upgradeSchemaFrom 2.1.0 

一切正常.然后,我重新创建了一个新的Spark集群,以确保它不会再次引发此异常.但是,当我跑步时

everything worked as expected. I then recreated a new Spark cluster to make sure it doesn't throw this exception again. However, when I ran

val df = spark.sql("select * from daily_active_user_trx")
df.show

在齐柏林飞艇笔记本电脑和Spark-shell上,出现以下错误.

on Zeppelin notebook and spark-shell, I got the following error.

org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 2.0 failed 4 times, most recent failure: Lost task 2.3 in stage 2.0 (TID 249, development-cluster-w-3.c.true-dmp.internal, executor 70): java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.uncompressedLength(Ljava/nio/ByteBuffer;II)I
at org.xerial.snappy.SnappyNative.uncompressedLength(Native Method)
at org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:565)
at org.apache.parquet.hadoop.codec.SnappyDecompressor.decompress(SnappyDecompressor.java:62)
at org.apache.parquet.hadoop.codec.NonBlockedDecompressorStream.read(NonBlockedDecompressorStream.java:51)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at org.apache.parquet.bytes.BytesInput$StreamBytesInput.toByteArray(BytesInput.java:205)
at org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary.<init>(PlainValuesDictionary.java:89)
at org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary.<init>(PlainValuesDictionary.java:72)
at org.apache.parquet.column.Encoding$1.initDictionary(Encoding.java:90)
at org.apache.parquet.column.Encoding$4.initDictionary(Encoding.java:149)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.<init>(VectorizedColumnReader.java:114)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:312)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:258)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:161)
at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:106)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:182)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:106)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.scan_nextBatch$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
at org.apache.spark.sql.execution.columnar.InMemoryRelation$$anonfun$1$$anon$1.hasNext(InMemoryRelation.scala:139)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:216)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1092)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1083)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1018)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1083)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:809)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

谷歌搜索后,我发现了一个类似的线程,但是它在CDH上

After googling, I found a similar thread but it's on CDH

我尝试按照建议将snappy-java-1.1.4.jar添加到主节点上的/usr/lib/spark/jars中,但是没有用.

I tried adding snappy-java-1.1.4.jar to /usr/lib/spark/jars on master node as suggested but it didn't work.

谢谢

Peernat F.

Peernat F.

推荐答案

这是 SPARK -24018 ,Dataproc小组目前正在处理该问题.

This is SPARK-24018, which the Dataproc team is currently working on addressing.

我相信要修复此问题,您需要在所有工作人员上使用jar,而不仅仅是主服务器,这就是为什么您的修复无法正常工作的原因.

I believe to fix it you need the jar on all workers not just the master, which I why your fix did not work.

我建议执行以下简单的初始化操作:

I would recommend a simple initialization action of:

rm -f /usr/lib/spark/jars/snappy*
wget https://repo1.maven.org/maven2/org/xerial/snappy/snappy-java/1.1.2.6/snappy-java-1.1.2.6.jar \
   -P /usr/lib/spark/jars

在我们确定我们完全理解该问题后的几周内,这应该会推广到新的Dataproc 1.3映像中.

This should roll out to new Dataproc 1.3 images in a couple weeks after we are sure we fully understand the issue.

这篇关于访问Hive时,Dataproc Spark返回java.lang.UnsatisfiedLinkError:org.xerial.snappy.SnappyNative.uncompressedLength(Ljava/nio/ByteBuffer; II)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆